EXHIBIT 8 



(19) 



} 



C) 

liili 




(12) 



(43) Date of publication: 

12.04.2000 Bulletin 2000/15 



EuropSisches Patentamt 
European Patent Office 

Office europeen des brevets (11 ) 

EUROPEAN PATENT APPLICATION 

(51) im.a/: C07H 21/00. C12Q 1/68 



EP 0 992 511 A1 



(21) Application number: 99113790.2 

(22) Date of filing: 23.01.1997 



(84) Designated Ck}ntracting States: 

ATBECHDE DKESRFRGBGR lEITLILUMC 
NL FT SE 

(30) Priority: 23.01.1996 US 589260 
23.01.1996 US 10462 P 

(62) Document numt:)er(s) of the earlier application(8) in 
accordance with Art. 76 EPC: 
97905634.8/0 868 535 

(71) Applicant: Raplgene, Inc. 
Bothell, Washington 98021 (US) 

(72) Inventors: 

• Howbert, J. Jeffry 
Washington 98005 (US) 



• Mulligan, John T. 
Washington 98105 (US) 

• Tabone, John C. 
Washington 9801 1 (US) 

• Van Ness, Jeffrey 
Washington 981 25 (US) 

(74) Representative: 

Gowshall, Jonathan Vallance 
FORRESTER & BOEHMERT 
Franz-Joseph-Strasse 38 
80801 MOnchen (DE) 

Remarks: 

This application was filed on 14 • 07 - 1999 as a 
divisional application to the application mentioned 
under INID code 62. 



(54) Methods and compositions for determining the sequence of nucleic add molecules 



(57) Methods and compounds, inclucfing composi- 
tions therefrom, are provided for determining the 
sequence of nucleic acid molecules.. The methods per- 
mit the determination of multiple nucleic add sequences 
simultaneously. The compounds are used as tags to 
generate tagged nucleic acid fragments which are com- 
plementary to a selected target nucleic acid molecule. 
Each tag is correlative with a particular nucleotide and, 
in a preferred emt)odiment, is detectable mass spec- 
trometry. Following separation of the tagged fragments 
by sequential length, the ta^ are cleaved from the 
tagged fragments. In a preferred embodiment, the tags 
are detected by mass spectrometry and the sequence 
of the nucleic acid molecule is determined therefrom. 
The individual steps of the methods can t>e used in 
automated format, e.g.. by the incorporation into sys- 
tems. 



CM 



Applicants: Jingyue Ju et al. 
Serial No.: 10/702,203 

Printed Dy Xerox (UK) Business Services Filed: November 4, 2003 

2 .67,HRS,y36 Exhibit 8 



^ o ... 

EP0 992 511 A1 

Description 
TECHNICAL FIELD 

5 [0001] The present invention relates generally to methods and compositions for determining the sequence of 
nudeic acid molecules, and more specifically, to methods and compositions which allow the determination of multiple 
nucleic acid sequences simultaneously. 

BACKGROUND OF THE INVENTION 

[0002] Decscyribonucletc acid (DNA) sequencing Is one of the t>asic techniques of biology. It is at the heart of molec- 
ular biology and plays a rapidly expanding role in the rest of biology. The Human Genome Project Is a multi-national 
effort to read the entire human genetic code. It is the largest project ever undertaken in biology, and has already begun 
to have a major impact on medicine. The development of cheaper and faster sequencing technology will ensure the 
75 success of this project Indeed, a substantial effort has been funded by the NIH and DOE branches of the Humevi 
Genome Project to inprove sequencing technology, however, without a substantial impact on current practices (Sulston 
and Waterston. Nature 376'A 75. 1 995). 

[0003] In the past two decades, determination and analysis of nucleic acid sequence has formed one of the building 
blocks of biological research. This, along with new investigational tools and methodologies, has allowed scientists to 
20 study genes and gene products in order to better understand the function of these genes, as well as to develop new 
therapeutics and diagnostics. 

[0004] Two different DNA sequencing methodologies that were developed in 1977. are still in wide use today. 
Briefly, the enzymatic method described by Sanger (Pioc. NatL Acad. Set. (USA) 74:5463. 1977) which utilizes dide- 
oxy-terminators, involves the synthesis of a DNA strand from a single-stranded tenplate by a DNA polymerase. The 

25 Sanger method of sequencing depends on the ^ct that that dideoxynudeotides (ddNTPs) are incorporated into the 
growing strand in the same way as normal deoxynucleotides (albeit at a lower efficiency). However. ddNTPs differ from 
normal deoxynucleotides (dNTPs) in tfiat they tack the 3'-OH group necessary for chain elongation. When a ddNTP is 
incorporated into the DNA chain, the absence of the 3*-hydroKy group prevents the formation of a new phosphodiester 
bond and the DNA fragment is terminated with the ddNTP complementary to the base in the template DNA. The Maxam 

30 and Gilbert method (Maxam and Gilbert Proc, NatL Acad ScL (USA) 74:560. 1977) employs a chemical degradation 
method of the original DNA (in both cases the DNA must be clonal). Both methods produce populations of fragments 
that begin from a particular point and terminate in every base that is found in the DNA fragment that is to be sequenced. 
The termination of each fragment is dependent on the location of a particular base witNn the original DNA fragment. 
The DNA fragments are separated by polyacrylamide gel electrophoresis and the order of the DNA bases (adenine. 

35 cytosine, thymine, guanine; also known as A.C.T.G. respectively) is read from a autoradiograph of the gel. 

[0005] A currt)er5ome DNA pooling sequencing strategy (Church and Kieffer-Higgins. Science 24:185, 1988) is 
one of the more recent approaches to DNA sequencing. A pooling sequencing strategy consists of pooling a nunUser of 
DNA tenplates (samples) and processing the samples as pools. In order to separate the sequence information at the 
end of the processing, the DNA molecules of interest are ligated to a set of oligonucleotide "tags" at the beginning. The 

40 tagged DNA molecules are pooled, amplified and chemically fragmented in 96-well plates. After electrophoresis of the 
pooled samples, the DNA is transfen^ed to a solid support and then hybridized with a sequential series of specific 
labeled oligonucleotides These membranes are then probed as many times as there are tags in the original pool, pro- 
ducing, in each set of probing, autoradiographs similar to those from standard DNA sequencing methods. Thus each 
reaction and gel yields a quantity of data equivalent to that obtained from conventional reactions and gels multiplied by 

45 the number of probes used. If alkaline phosphatase is used as the reporter enzyme, 1 ,2-dioxetane substrate can be 
used which is detected in a chemiluminescent assay format However, this pooling strategy's major disadvantage is that 
the sequences can only be read by Southern blotting the sequencing gel and hyt)ridizing this membrane once for each 
clone in the pod. 

[0006] In addition to advances in sequencing methodologies, advances in speed have occurred due to the advent 
so of automated DNA sequencing. Briefly, these methods use fluorescent-labeled primers which replace methods which 
employed radiolal^led components. Fluorescent dyes are attached either to the sequencing primers or the ddNTP-ter- 
minators. Robotic components now utilize polymerase chain reaction (PGR) technology which has lead to the develop- 
ment of linear amplification strategies. Current commercial sequencing allows all 4 dideoxy-terminator reactions to be 
run on a single lane. Each dideoxy-terminator reaction is represented by a unique fluorescent primer (one fluorophore 
55 for each base type: A.T.C.G). Only one template DNA (i.e., DNA sample) is represented per lane. Current gels permit 
the simultaneous electrophoresis of up to 64 samples in 64 different lanes. Different ddNTP-terminated fragments are 
detected by the irradiation of the gel lane by light followed by detection of emitted light from the fluorophore. Each elec- 
trophoresis step is about 4-6 hours long. Each electrophoresis separation resolves about 400-600 nucleotides (nt). 
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therefore, about 6000 nt can be sequenced per hour per sequencer. 

[0007] The use of mass spectrometry for the study of monomeric constituents of nucleic acids has also been 
descrUDed (Hignite, In Biochemical Applications of Mass Spectrometry, Waller and Dermer (eds.), Wiley-lnterscience. 
Chapter 16. p. 527, 1972). Briefly, for larger oligomers, significant early success was obtained by plasma desorption for 
5 protected synthetic oligonucleotides up to 14 bases long, and for unprotected oligos up to 4 bases in length. As with 
proteins, the applicability of ESI-MS to oligonucleotides has been demonstrated (Covey et al.. Rapid Comm, in Mass 
Spec. 2:249-256. 1988). These species are ionized in solution, with the charge residing at the acidic bridging phos- 
phodiester and/ or terminal phosphate moieties, and yield in the gas phase multiple charged molecular anions, in addi- 
tion to sodium adducts. 

10 [0008] Sequencing DNA with <1 00 bases by the common enzymatic ddNTP technique is more complicated than it 
is for larger DNA templates, so that chemical degradation is sometimes employed. However, the chemical deconposi- 
tion method requires about 50 pmol of radioactive end-labeled material. 6 chemical steps, electrophoretic separa- 
tion, and film exposure. For small oligonucleotides (<14 nts) the combination of electrospray ionization (ESI) and 
Fourier transform (FT) mass spectrometry (MS) is far faster and more sensitive. Dissociation products of multiply- 
15 charged ions measured at high (10^ resolving power represent consecutive backbone cleavages providing the full 
sequence in less than one minute on sub-picomoie quantity of sample (Little et al. J. Am, Chem. Soc, 1 75:4893. 1 994). 
For molecular weight measurements. ESI/MS has been extended to larger fragments (Potier et al.. Nuc, Acids Res. 
22:3895. 1994). ESI/FTMS appears to be a valuable complement to classical methods for sequencing and pinpoint 
mutations in nucleotides as large as 100-mers. Spectral data have recently been obtained loading 3 x 10'^^ mol of a 50- 
20 mer using a more sensitive ESI source (Valaskovic, Anal. Chem. 59:259, 1995). 

[0009] The other approach to DNA sequencing by mass spectrometry is one in which DNA is l^eled with individual 
isotopes of an element and the mass spectral analysis simply has to distinguish the isotopes after a mixtures of sizes 
of DNA have been separated by electrophoresis. (The other approach described above utilizes the resolving power of 
the mass spectrometer to both separate and detect the DNA oligonucleotides of different lengths, a difficult proposition 
25 at best.) All of the procedures described below employ the Sanger procedure to convert a sequencing primer to a series 
of DNA fragments that vary in length by one nucleotide. The enzymatically synthesized DNA molecules each contain 
the original primer, a replicated sequence of part of the DNA of interest, and the dideoxy terminator. That is. a set of 
DNA molecules is produced that contain the primer and differ in length by from each other by one nucleotide residue. 
[0010] Brennen et al. (BioL Mass Spec., New >ibrk, Elsevier, p, 219, 1990) has described methods to use the four 
30 stable isotopes of sulfur as DNA labels that enable one to detect DNA fragments that have been separated by capillary 
electrophoresis. Using the a-thio analogues of the ddNTPs. a single sulfur Isotope is incorporated into each of the DNA 
fragments. Therefore each of the four types of DNA fragments (ddTTR ddATP. ddGTP. ddCTP-temiinated) can be 
uniquely labeled according to the terminal nucleotide: for exanple. ^^S for fragments ending in A. ^^S for G. for C. 
and ^®S for T, and mixed together for electrophoresis column, fractions of a few picditers are otrtained by a modified 
35 ink-jet printer head, and ten subjected to complete combustion in a furnace. This process oxidizes the thiophosphates 
of the labeled DNA to SO2. which is subjected to analysis in a quadrupole or magnetic sector mass spectrometer. The 
SO2 mass unit representation is 64 for fragments ending in A. 65 for G. 66 for C. and 68 for T. Maintenance of the res- 
olution of the DNA fragments as they emerge from the column depends on taking sufficiently small fractions. Because 
the mass spectrometer is coupled directly to the capillary gel column, the rate of analysis is determined by the rate of 
40 electrophoresis. This process is unfortunately expensive, liberates radioactive gas and has not been commerciaUzed. 
Two other tasic constraints also operate on this approach: (a) No other conponents with mass of 64. 65, 66, or 68 (iso- 
baric contaminants) can be tolerated and (b) the % natural abundances of the sulfur isotopes (^S is 95.0. ^^S is 0.75, 
^"^S is 4.2. and ^3 is 0.1 1) govern the sensitivity and cost. Since ^^s is 95% naturally abundant, the other isotopes must 
be enriched to >99% to eliminate contaminating ^^S. Isotopes that are <1% abundant are quite expensive to obtain at 
45 99% enrichment; even when ^^S is purified 100-fold it contains as much or more ^S as it does ^S. 

[0011] Gilbert has described an automated DNA sequencer (EPA. 92108678.2) that consists of an oligomer syn- 
thesizer, an array on a membrane, a detector which detects hybridization and a central conputer. The synthesizer syn- 
thesizes and labels multiple oligomers of arbitrary predicted sequence. The oligomers are used to probe immobilized 
DNA on membranes. The detector identifies hybridization patterns and then sends those patterns to a central computer 
so which constructs a sequence and then predicts the sequence of the next round of synthesis of oligomers. Through an 
iterative process, a DNA sequence can be obtained in an automated fashion. 

[0012] Brennen has described a method for sequencing nucleic acids based on ligation of oligomers (U.S. Patent 
No. 5.403.708). Methods and conpositions are described for forming ligation product hybridized to a nucleic acid tem- 
plate. A primer is hybridized to a DNA template and then a poo! of random extension oligonucleotides is also hybridized 
55 to the primed template in the presence ligase(s). The iigase enzyme covalently ligates the hybridized oligomers to the 
primer. Modifications permit the determination of the nucleotide sequence of one or more members of a first set of tar- 
get nucleotide residues in the nucleic acid template that are spaced at intervals of N nucleotides. In this method, the 
labeled ligated product is formed wherein the position and type of label incorporated into the ligation product provides 
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information concerning the nucleotide residue in the nucleic add template with which the labeled nucleotide residue is 
base paired. 

[001 3] Koster has described an method for sequencing DN A by mass spectrometry after degradation of ONA by an 
exonuclease (PCT/US94/02938). The method described is simple in that DNA sequence is directly determined (the 
5 Sanger reaction is not used). DNA is cloned into standard vectors, the 5' end is immobilized and the strands are then 
sequentially degraded at the 3* end via an exonuclease and the enzymatic product (nucleotides) are detected by mass 
spectrometry. 

[0014] Weiss et al. have described an automated hybridization/imaging device for fluorescent multiplex DNA 
sequencing (PCTAJS94/1 1918). The method is based on the concept of hybridizing enzyme-linked probes to a mem- 

10 brane containing size separated DNA fragments arising from a typical Sanger reaction. 

[001 5J The demand for sequencing information is larger than can be supplied by the currently existing sequencing 
machines, such as the ABI377 and the Pharmacia ALR One of the principal limitations of the current technology is the 
small number of tags which can be resolved using the current tagging system. The Church pooling system discussed 
above uses more tags, txjt the use and detection of these tags is laborious. 

15 [001 6J The present invention discloses novel compositions and methods which may be utilized to sequence nucleic 
acid molecules with greatly increased speed and sensitivity than the methods described above, and further provides 
other related advantages. 

SUMMARY OF THE INVENTION 

20 

[0017] Briefly stated, the present invention provides methods, compounds, compositions, kits and systems for 
determining the sequence of nucleic acid molecules. Within one aspect of the invention, methods are provided for 
determining the sequence of a nucleic add molecule. The methods includes the steps: (a) generating tagged nucleic 
ackJ fragments which are complementary to a selected target nucleic acid molecule, wherein a tag is correlative with a 
25 particular nucleotide and detectable by non-fluorescent spectrometry or potentiometry: (b) separating the tagged frag- 
ments by sequential length: (c) cleaving the tags from the tagged fragments; and (d) detecting the tags by non4luores- 
cent spectrometry or potentiometry, and therefrom determining the sequence of the nucleic add molecule. In preferred 
embodiments, the tags are detected by mass spectrometry, infrared spectrometry, ultraviolet spectronnetry or potentio- 
static amperometry. 

30 [0018] In another aspect the invention provides a conpound of the formula: 

wherein is an organic group detectable by mass spectrometry, comprising cart>on. at least one of hydrogen and f lu- 
35 oride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine: L is an organic group which 
allows a T"^-containing moiety to be cleaved from the remainder of the compound, wherein the T"®-containing moiety 
comprises a functional group which supports a single ionized charge state when the compourxJ is sut>jected to mass 
spectrometry and is selected from tertiary amine, quaternary amine and organic acid; X is a functional group selected 
from hydroxyl, amino, thiol, carboocylic acid, haloalkyl. and derivatives thereof which either activate or inhibit the activity 
40 of the group toward coupling with other moieties, or is a nucleic acid fragment attached to L at other than the 3* end of 
the nucleic acid fragment; with the provisos that the compound is not bonded to a solid support through X nor has a 
mass of less than 250 daltons. 

[0019] In another aspect, the invention provides a composition comprising a plurality of corrpounds of the formula 
T^^-L-MOI. wherein, T"® is an organic group detectable by mass spectrometry, comprising carbon, at least one of 

45 hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group whteh allows a T^^-containing moiety to be cleaved from the remainder of the compound, wherein the 
T^^-corrtaining moiety comprises a functional group which supports a single ionized charge ate when the compound is 
subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and organic acid; MOI is a 
nudeic acid fragment wherein L is conjugated to the MOI at a location other than the 3* end off the MOI; and wherein no 

so two conrpounds have either the same V^^ or the same MOI. 

[0020] In another aspect, the invention provides a composition comprising water and a conpound of the formula 
T^^-L-MOI. wherein. T"^ is an organic group detectable by mass spectrometry comprising cart>on, at least one of 
hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group which allows a r"^-containing moiety to be cleaved from the remainder of the compound, wherein the 

55 T^^-containing moiety comprises a functional group which supports a single ionized charge state when the compound 
is subjected to mass spectrometry and is selected from tertiary amine, quatemary amine and organic acid; and MOI is 
a nudeic acid fragment wherein L is conjugated to the MOI at a location other than the 3' end of the MOI. 
[0021] In another aspect, the invention provides for a composition conprising a plurality of sets of conpounds. 
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each set of compounds having the formula T^-L-MOI, wherein, is an organic group detectalDle by mass spectrom- 
etry, corrprising carfc>on. at least one of hydrogen and fluoride, and optional atoms selected from oxygen, nitrogen, sul- 
fur, phosphorus and Iodine; L is an organic group which allows a T^-containing moiety to be cleaved from the 
remainder of the compound, wherein the T"^-containing moiety comprises a functional group which supports a single 

5 ionized charge state when the compound is subjected to mass spectrometry and is selected from tertiary amine, qua- 
ternary amine and organic acid; MOI is a nucleic acid fragment wherein L Is conjugated to the MOI at a location other 
than the 3* end of the MOI; wherein within a set. all members have the same group, and the MOI fragments have 
variat»le lengths that terminate with the same dideoxynucieotide selected from ddAMP. dtiiOMF, ddCMP and ddlMP; 
and wherein between sets, the groups differ by at least 2 amu. 

10 [0022] In another aspect the Invention provides for a composition comprising a first plurality of sets of compounds 
as described In the preceding paragraph, in combination with a second plurality of sets of compounds having the for- 
mula T*"®-L-MOI, wherein, T™ is an organic group detectable by mass spectrometry, comprising caitx)n. at least one 
of hydrogen and fluoride, and optional atom$ selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an 
organic group which allows a T^^-containing moiety to be cleaved from the remairxler of the compound, wherein the 

IS T^-containing moiety comprises a functional group which supports a single ionized charge state when the compound 
is subjected to mass spectrometry and is selected from tertiary amine, quaternary amine and organic acid; MOI is a 
nucleic acid fragment wherein L is conjugated to the MOI at a location other than the 3* end of the MOI; arxi wherein all 
members within the second plurality have an MOI sequence which terminates with the same dtdeoxynucleotkJe 
selected from ddAMP, ddOMP, ddCMP and ddTMP; with the proviso that the dideoxynucieotide present in the com- 

20 pounds of the first plurality is not the same dideoxynucieotide present in the conrpounds of the second plurality. 

[0023] In another aspect, the invention provides for a kit for DNA sequencing analysis. The kit comprises a plurality 
of container sets, each container set comprising at least five containers, wherein a first container contains a vector, a 
second, third, fourth and fifth containers contain compourxjs of the formula T"^-L-MOI wherein, is an organic group 
detectaft>ie by mass spectrometry, comprising cartx)n. at least one of hydrogen and fluoride, and optional atoms 

25 selected from oxygen, nitrogen, sulfur, phosphorus and iodine; L is an organic group which allows a T^-containing 
moiety to be cleaved from the remainder of the compound, wherein the T"*-containing moiety comprises a functional 
group which supports a single ionized charge state when the compound is subjected to mass spectrometry and is 
selected from tertiary anvne, quaternary amine and organic acid; and MOI is a nucleic acid fragment wherein L is con- 
jugated to the MOI at a kx:ation other than the 3* end of the MOI; such that the MOI for the second, third, fourth and fifth 

30 containers is identical and complementary to a portion of the vector within the set of containers, and the group 
within each container is different from the other T"® groups in the kit. 

[0024] In another aspect the invention provides for a system for determining the sequence of a nucleic acid mole- 
cule. The system comprises a separation apparatus that separates tagged nucleic acid fragments, an apparatus that 
cleaves from a tagged nucleic acid fragment a tag which is correlative with a particular nucleotide and detectable by 
35 electrochemical detection, and an apparatus for potentiostatic amperometry. 

[0025] Within other enr^xxiiments of the invention. 4. 5. 1 0. 15. 20. 25, 30. 35. 40. 50. 60. 70. 80. 90. 100, 200. 250, 
300. 350. 400. 450 or greater than 500 different and unique tagged molecules may be utilized within a given reaction 
simultaneously, wherein each tag is unique for a selected nucleic acid fragment, probe, or first or second member, and 
may be separately identified. 

40 [0026] These and other aspects of the present invention will become e/ident upon reference to the following 
detailed description and attached drawings. In addition, various references are set forth below which describe in more 
detail certain procedures or compositions (e.^. plasmids. etc.). and are therefore incorporated by reference in their 
entirety. 

45 BRIEF DESCRIPTION OF THE DRAWINGS 
[0027] 

Figure 1 depicts the flowchart for the synthesis of pentaf luorophenyl esters of chemically deavable mass spectros- 
50 copy tags, to liberate tags with cartx)xyt amide termini. 

Figure 2 depicts the flowchart for the synthesis of pentaf luorophenyl esters of chemically deavable mass spectros- 
copy tags, to liberate tags with carboxyl add termini. 

Figures 3-6 and 8 depict the flowchart for the synthesis of tetrafluorophenyl esters of a set of 36 photochemicaliy 
deavable mass spec. tags. 

55 Figure 7 depids the flowchart for tiie synthesis of a set of 36 amine-terminated photochemicaliy deavable mass 
spedroscopy tags. 

Figure 9 depids the synthesis of 35 photochemicaliy deavable mass spectroscopy tagged oligonudeotides made 
from the corresponding set of 36 tetrafluorophenyl esters of photochemicaliy deavable mass spectroscopy tag 
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acids. 

Rgure 10 depicts the synthesis of 36 photochemically cleavable mass spectroscopy tagged oligonucleotides made 
from the corresponding set of 36 amine-terminated photochemically cleavable mass spectroscopy tags. 
Figure 1 1 illustrates the simultaneous detection of multiple tags by mass spectrometry. 
5 Rgure 12 shows the mass spectrogram of the alpha-cyano matrix alone. 

Rgure 13 depicts a modutarly-constructed tagged nucleic acid fragment 

DETAILED DESCRIPTION OF THE INVENTION 

10 [0028] Briefly stated, in one aspect the present invention provides compounds wherein a molecule of interest, or 
precursor thereto, is linked via a labile bond (or labile bonds) to a tag. Thus, compounds of the invention may be viewed 
as having the general formula: 

TLX 

15 

wherein T is the tag component, L is the linker conponent that either is. or contains, a labile bond, and X is either the 
molecule of interest (MOI) component or a functional group component (LfJ through which the MOI may be joined to T 
L. Compounds of the invention may therefore be represented by the nv>re specific general formulas: 

20 T-L-MOl and TL-L^ 

[0029] For reasons described in detail below, sets of TL-MOl compounds may be purposely subjected to conditions 
tat cause the labile bond(s) to break, thus releasing a tag moiety from the remainder of the compound. The tag moiety 
is then characterized by one or more analytical techniques, to thereby provide direct infonnation about the structure of 
25 the tag moiety, and (most importantly) indirect information about the identity of the con'esponding MOI. 

[0030] As a simple illustrative example of a representative compound of the invention wherein L is a direct bond, 
reference is made to the fbflowing structure (i): 




conponent 



45 In structure (i). T is a nitrogen-containing polycydic aromatic moiety bonded to a carbonyl group. X is a MOI (and spe- 
cifically a nucleic acid fragment terminating in an amine group), and L is the bon6 which forms an amide group. The 
amide t>ond is labile relative to the t>onds in T because, as recognized in the art, an amide bofKl may t>e chemically 
cleaved (broken) by add or base conditions which leave the bonds within the tag component unchanged. Thus, a tag 
moiety (i.e.. the cleavage product that contains 7) may be released as shown below: 



55 
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Structure (i) 



IS 



20 




^(Nuckic Acid Fragment) 



acid or base 



OH HjN- 



^(Nucfeic Acid Fragment) 



Tag Moiety 



Remainder of the Compound 



25 



[0031 ] However, the linker L may be more than merely a direct bond, as shown in the following illustrative example, 
where reference is made to another representative conpound of the invention having the structure (ii) shown below: 



30 



35 



40 



Structure {n) 
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^(Nucleic Acid 
Fragment) 



MOI 



45 



It Is wetl-known that compounds having an ortAio-nitrot^nzylamine moiety (see boxed atoms within structure (it)) are 
photolytically unstable, In that exposure of such conpounds to actinic radiation of a specified wavelength will cause 
selective cleavage of the benzyiamine bond (see bond denoted with heavy line in structure (ii)). Thus, structure (ii) has 
the same T and MOI groups as structure (I), however the linker group contains multiple atoms and bonds within which 
there is a particularly labile bond. Photolysis of structure (ii) thus releases a tag moiety (T-oontaining moiety) from the 
remainder of the compound, as shown k>ek3w. 



so 
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Structure (ii) 



10 
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20 




(Nucleic Acid 
Fragment) 
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Tag Moiety 



Remander of ttie Coxxpound 



25 



30 



35 



[0032] The invention thus provides compounds which, upon exposure to appropriate cleavage conditions, undergo 
a cleavage reaction so as to release a tag moiety from the remainder of the compound. Compounds of the invention 
may be described in terms of the tag moiety, the MOI (or precursor thereto, Lh), and the labile bond{s) which join the 
two grotps together Alterratively. the compounds of the invention may be described in terms of the components from 
which they are formed. Thus, the compounds may be described as the reaction product of a tag reactant. a linker reac- 
tant and a MOI reactant as follows. 

[0033] The tag reactant consists of a chemical handle (T^) and a variable component (Tv^). so that the tag reactant 
is seen to have the general structure: 

Tve-Th 



To illustrate this nomenclature, reference m^ be made to structure (Hi), which shows a tag reactant that may be used 
to prepare the compound of structure (ii). The tag reactant having structure (iii) contains a tag variable component and 
40 a tag handle, as shown below: 



45 



50 



55 



Structure (iii) 




Tag Variable 
Conponent 



Tag 
Handle 
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[0034] In structure (iii). the tag handle (-C(=0)-A) simply provides an avenue for reacting the tag reactant with the 
linker reactant to form a T-L moiety. The group "A" In structure (iii) indicates that the cart>oxyl group is in a chemically 
active slate, so it is ready for coupling with other handles. "A" may be. for example, a hydroxy! group or pentafluorophr 
enoxy, among many other possibilities. The invention provides for a large number of possible tag hancfles wNch may be 

5 bonded to a tag variable component, as discussed in detail below. The tag variable component is thus a part of T** in 
the formula T-L-X, and will also be part of the tag moiety that forms from the reaction that cleaves L 
[0035] As also discussed in detail below, the tag variable component is so-named because, in preparing sets of 
compounds according to the invention, it is desired that members of a set have unique variable components, so that the 
irxfividual members may be distinguished from one another by an analytical technique. As one example, the tag varia- 

10 ble component of structure (iii) may be one member of the following set. where members of the set may be distin- 
guished by their UV or mass spectra: 




[0036] Likewise, the linker reactant may be described in terms of its chemical handles (there are necessanly at 
least two. each of which may be designated as which flank a linker lafcxle component, where the linker labile com- 
25 ponent consists of the required labile moiety (L^) and optional labile moieties (L** and L^). where the optional latxie moi- 
eties effectively serve to separate from the handles L^. and the required labile moiety serves to provide a labile bond 
within the linker labile component. Thus, the linker reactant may be seen to have the general formula: 

30 

[0037] The nomenclature used to describe the linker reactant may be illustrated in view of structure (iv). whk:h 
again draws from the compound of structure (ii): 

35 Stnjctise (iv) 



Linker 
Handle 




[0038] As structure (iv) illustrates, atoms may serve in more than one functional role. Thus, in structure (iv). the ben- 
so zyl nitrogen functions as a chemical handle in allowing the linker reactant to join to the tag reactant via an amide-form- 
ing reaction, and subsequently also serves as a necessary part of the structure of the labile moiety in that the 
benzytic cartx>n-nitrogen k>ond is particularly susceptible to photolytic cleavage. Structure (iv) also illustrates that a 
linker reactant may have an group (in this case, a methylene group), although not have an U group. Likewise, linker 
reactants may have an group but not an group, or may have and groups, or may have neither of nor 
55 groups. In structure (iv). the presence of the group "P" next to the cartx)nyl group indicates that the cart>onyl group is 
protected from reaction. Given tiiis configuration, the activated cartxixyl groins of the tag reactant (iiO nnay cleanly react 
with the amine group of the linker reactant (iv) to form an amide bond and give a compound of the fornrula T-L-L^. 
[0039] The MOI reactant is a suitably reactive form of a molecule of interest. Where the molecule of interest is a 
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nudeic acid fragment, a suitable MOI reactant is a nucleic acid fragment bonded through its S hydroxyl group to a phos- 
phodiester group and then to an alkylene chain that terminates in an amino group. This amino group may then react 
with the carbonyl group of structure (iv). (after, of course, deprotecting the cart)onyl group, and preferably after subse- 
quently activating the carbonyl group toward reaction with the amine group) to thereby join the MOI to the linker. 
[00401 When viewed in a chronological order, the invention is seen to take a tag reactant (having a chemical tag 
handle and a tag variable component), a linker reactant (having two chemical linker handles, a required labile moiety 
and 0-2 optbnal labile moieties) and a MOI reactant (having a molecule of interest component and a chemical molecule 
of interest handle) to form T-L-MOI. Thus, to form T-L-MOI. either the tag reactant and the linker reactant are frst 
reacted together to provide T-L-Ln, and then the MOI reactant is reacted with T-L-L#, so as to provide T-L-MOI. or else 
(less preferably) the Tinker reactant and the MOI reactant are reacted together first to provide 1^,-L-MOI. and then Ln-L- 
MOI is reacted with the tag reactant to provide T-L-MOI. For purposes of convenience, compounds having the formula 
T-L-MOI will be described in terms of the tag reactant, the linker reactant and the MOI reactant which may be used to 
form such compounds. Of course, the same compounds of fbrmula T-L-MOI could be prepared by other (typically, more 
laborious) methods, and still fall within the scope of the inventive T-L-MOI compounds. 

[0041] In any event, the invention provides that a T-L-MOI compound be subjected to cleavage conditions, such that 
a tag moiety is released from the remainder of the compound. The tag moiety will comprise at least the tag variable 
component, and will typically additionally comprise some or all of the atoms from the tag handle, some or all of the 
atoms from the linker handle that was used to join the tag reactant to the finker reactant, the optkjnal labile moiety L^ if 
this group was present in T-L-MOI. and will perhaps contain some part of the required labile moiety depending on 
the precise structure of L^ and the nature of the cleavage chemistry. For convenience, the tag moiety may be referred 
to as the T-containing moiety because T will typically constitute the major portion (in terms of mass) of the tag moiety. 
[0042] Given this introduction to one aspect of the present invention, the various components T. L and X wfll be 
described in detail. This description begins with the following definitions of certain terms, which will be used hereinafter 
in describing T. L and X. 

[0043] As used herein, the term "nucleic acid fragment" means a molecule which is complementary to a selected 
target nucleic acid molecule (i.e.. complementary to all or a portion thereof), and may be derived from nature or syn- 
thetically or recombinantJy produced. Including non-naturally occuning molecules, and may be in double or single 
stranded form where appropriate; and includes an oligonucleotide (e.g.. DNA or RNA). a primer, a probe, a nucleic acid 
analog (e.g., PNA), an oligonucleotide which is extended in a 5* to 3' direction by a polymerase, a nucleic add which is 
cleaved chemically or enzymatically. a nucleic add that is terminated with a dideoxy terminator or capped at the 3* or 5' 
er^ with a compound that prevents polymerization at the 5' or 3' end. and combinations thereof The complementarity 
of a nucleic add fragment to a selected target nudeic add molecule generally means the exhft>ition of at least about 
70% specific base pairing throughout the lengtfi of the fragment Preferably tine nucleic acid fragment exhibits at least 
about 80% specific base pairing; and most preferably at least about 90%. Assays for determining the F>ercent mismatch 
(and thus the percent specific base pairing) are well known in tfie art and are based upon the percent mismatch as a 
function of the Tm when referenced to the fully base paired control. 

[0044] As used herein, the term "alkyl." alone or in combination, refers to a saturated, straight-chain or branched- 
chain hydrocartx>n radical containing from 1 to 10. preferably from 1 to 6 and more preferably from 1 to 4. carbon atoms. 
Examples of such radicals include, but are not limited to. methyl, ethyl, n-propyl, iso-propyl. n-butyl, iso-butyl. sec-butyl, 
tert-butyl. pentyl. iso-amyl. hexyl. decyl and the like. The term "alkylene" refers to a saturated, straight-chain or 
branched chain hydrocarbon diradical containing from 1 to 10. preferably from 1 to 6 and more preferably from 1 to 4. 
cartjon atoms. Examples of such diradicals indude, but are not limited to. methylene, ethylene (-CH2-CH2-), propylene! 
and the like. 

[0045] The term "alkenyl," alone or in combination, refers to a straight-chain or branched-chain hydrocartjon radical 
having at least one cartx)n-carbon double bond in a total of from 2 to 10. preferably from 2 to 6 and more preferably from 
2 to 4, carljon atoms. Examples of such radicals include, but are not limited to. ethenyl, E- and 2-propenyl. isopropenyl. 
E- and Z-butenyl. E- and 2-isobutenyl, E- and 2-pentenyl. decenyl and the like. The term "alkenylene" refers to a 
stoBight-chain or branched-chain hydrocarbon diradical having at least one carbon-carbon double bond in a total of from 
2 to 10. preferably from 2 to 6 and more preferably from 2 to 4, carbon atoms. Examples of such diradicals include, but 
are not limited to. metiiylidene (^CHg). etiiylidene (-CH=CH-). propylidene (-CHg-CH^CH-) and the like. 
[0046] The term "alkynyl." alone or in combination, refers to a straight-chain or branched-chain hydrocaibon radical 
having at least one carbon-carbon triple bond in a total of from 2 to 10, preferably from 2 to 6 and more preferably from 
2 to 4. carbon atoms. Examples of such radicals include, but are not limited to. ethynyl (acetylenyl). propynyl (proper- 
gyi). butynyl. hexynyl. decynyl and the like. The term "alkynylene". alone or in combination, refers to a straight-chain or 
branched-chain hydrocarbon diradical having at least one carbon-caitoon triple bond in a total of from 2 to 10. preferably 
from 2 to 6 and more preferably from 2 to 4. carbon atoms. Examples of such radicals indude. kxrt are not limited, etfiy- 
nylene (-C^C-). propynylene (-CHg-C^C-) and the like. 

[0047] The term "cycloalkyl." alone or in combination, refers to a saturated, cyclic arrangement of carbon atoms 
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which number from 3 to 8 and preferably from 3 to 6, cartjon atoms. Exanples of such cycloalkyi radicals include, but 
are not limited to. cyclopropyl, cydobutyl, cyclopentyl, cyclohexyl and the like. The term "cycloallylene" refers to a dirad- 
ical form of a cydoalkyl. 

[0048] The term '"cycloalkenyl." alone or in combination, refers to a cydic carbocyde containing from 4 to 8. prefer- 
ably 5 or 6, carbon atoms and one or more double bonds. Exanples of sudi cydoaJkenyl radicals include, but are not 
limited to. cyclopentenyl. cyclohexenyl, cydopentadienyl and the like. The term "cycloalkenylene" refers to a diradical 
form of a cycloalkenyl. 

[0049] The term "aryl" refers to a carbocyclic (consisting entirely of cartoon and hydrogen) aromatic group selected 
from the group consisting of phenyl, naphthyl. indenyl, indanyl, azulenyl, fluorenyl. and anthracenyl; or a heterocyclic 
aromatic group selected from ttie group consisting of furyl. thienyl. pyridyt. pyrrolyl. oxazolyly. thiazotyl. imidazolyl. pyra- 
zolyl, 2-pyrazoliny1. pyrazolidinyl. isoxazolyl. isothiazolyl, 1. 2. 3-oxadiazolyl. 1. 2, 3-tria20lyl. 1. 3, 4-ttiiadia20fyl. pyri- 
dazinyl. pyrimidinyl, pyrazinyl. 1, 3. S-triazinyl, 1. 3, 5-tritiiianyl. indolizinyl. indolyl. isoinddyl. 3H-indolyl. Indolinyl. 
benzo[b)furanyl. 2. 3-dihydrobenzofuranyl. benzofblthiophenyl. IH-indazolyl. benzimidazolyl, benzthiazolyl. purtnyl. 4H- 
quinolizlnyl. quindinyl. isoquinolinyl. cinnolinyl. phthalazinyl, quinazolinyl. quinoxalinyl, 1, 8-naphthyridinyl. pteridinyl. 
cartiazolyl. acridinyl. phenazinyl, phenothiazinyl. and phenoxazinyl. 

[0050] "Aryl- groups, as defined in this application may rndependentiy contain one to four substituents which are 
independently selected from the group consisting of hydrogen, halogen, hydroxyl. amino, nitro. trifluoromethyl. trifluor- 
omethoxy. alkyl. alkenyl, alkynyl. cyano. cartx)xy. carboalkoxy, 1 .2-dioxyethylene, alkoxy, alkenoxy or alkynoxy, 
alkylamino, alkenyisunino, alkynylamino, aliphatic or aromatic acyl. alkoxy-cart>onylamino. alkylsulfbnylamino, mor- 
pholinocart)onylamino. thiomorpholinocarbonylamino. N-alkyI guanidino. aralkylaminosuKbnyl; aralkoxyalkyi; N-aralkox-. 
yurea; N-hydroxylurea; N-alkenylurea; N.N-{alkyl. hydroxyl)urea; heterocyclyl; tiiioaryloxy-substituted aryl; N,N-{aryl. 
alkyl)hydrazino: Ar'-substituted sutfonylheterocydyl; aralky I -substituted heterocydyl; cycloalkyi and cycloakenyl-substi- 
tuted heterocyclyl: cycloalkyl-fused aryl; aryloxy-substituted alkyl; heterocyclylamino; aliphatic or aromatic acylaminoc- 
artDonyl; aliphatic or aromatic acyl-substituted alkenyl; Ar'-substituted aminocart>onyloxy; Ar\ Ar'-disubstituted aryl; 
aliphatic or aromatic acyl-substituted acyl; cycloalkylcarbonylalkyi; cycloalkyl-sutsstituted amino; aryloxycart)onylalkyl; 
phosphorodiamidyl acid or ester; 

[0051] "Ar- is a carbocyclic or heterocyclic aryl group as defined above having one to ttiree substituents selected 
from tiie group consisting of hydrogen, halogen, hydroxy!, amino, nitro. trifluoromethyl. trifluoromethoxy. alkyl. alkenyl. 
alkynyl. 1 ,2-dioxymelhyfene. 1.2-diQxyethylene. alkoxy. alkenoxy. alkynoxy. alkylamino, alkenylamino or alkynylamino. 
alkylcartx>nyloxy. aliphatic or aromatic acyl. aJkylcartx)nylamino, alkoxycarbonylamino. alkylsuHbnylamino, N-alkyI or 
N.N-dialkyI urea. 

[0052] The term "alkoxy." alone or in combination, refers to an alkyl ether radical, wherein the term "alkyP is as 
defined above. Examples of suitable alkyl ether radicals include, but are not limited to. methoxy. ethoxy, n-propoxy. iso- 
propoxy. n-butoxy, iso-butoxy. sec-butoxy. tert-butoxy and the like. 

[0053] The term "alkenoxy, " atone or in combination, refers to a radtoal of formula alkenyl-O-. wherein tfie term 
"alkenyl* is as defined above provided that tiie radical is not an end ether. Examples of suitable alkenoxy radicals 
include, but are not limited to. allyloxy, E- and Z-3-methyl-2-propenoxy and ttie like. 

[0054] The term "alkynyloxy.* atone or in combination, refers to a radical of formula alkynyl-O-, wherein the term 
"alkynyl* is as defined above provided that the radical is not an ynot etiier. Examples of suitable alkynoxy radtoals 
tndude, but are not limited to, propargyloxy. 2H3utynyloxy and ttie like. 

[0055] The term Ihioalkoxy" refers to a thioether radical of formula alkyl-S-, wherein alkyl is as defined above. 
[0056] The term "alkylamino.* alone or in combinatton. refers to a mono- or di-alkyl-sid>stituted amino radical (/.e., 
a radical of fornula alkyl-NH- or (aikyl)2-N-). wherein the term "alkyl" is as defined above Examples of suitable 
alkylamino radicals indude. but are not limited to, metiiylamino. etiiylamino. propylamino. isopropylamnino. t- 
txjtylamino. N.N<liethylamino and the like. 

[0057] The term "alkenylamino," alone or in combination, refers to a radical of formula alkenyl-NH- or (alkenyl)2N-, 
wherein the term "alkenyl" is as defined above, provided that the radical is not an enam^e. An example of such alke- 
nylamino radicals is the allylamino radical. 

[0058] The term "alkynylamino," alone or in combination, refers to a radical of formula alkynyl-NH- or (alkynyOgN-. 
wherein the term "alkynyl" is as defined alxjve. provided that the radical is not an ynamine. An example of suit alky- 
nylamino radicals is the propargyl amino radical. 

[0059] The term "amide" refers to either -N(R^)-C(»0)- or -C(«=0)-N(R^)- where is defined herein to include 
hydrogen as well as other groups. The term "substituted amide" refers to the situation where R^ is not hydrogen, while 
the term "unsubstituted amide" refers to the situation where R^ is hydrogen. 

[0060] The term "aryloxy." alone or in combination, refers to a radical of formula aryl-O-. wherein aryl is as defined 
above. Examples of aryloxy radicals indude. but are not limited to. phenoxy, naphthoxy. pyridyloxy and tiie like. 
[0061] The term "arylamino," alone or in combination, refers to a radical of formula aryl-NH-. wherein aryl is as 
defined above. Examples of arylamino radicals include, but are not linrated to. phenylamino (anilido), naphthylamino. 2- 
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, 3- and 4-pyridylamino and the like. 

[0062] The term "aryl-fused cydoalkyl," alone or In combination, refers to a cycloalkyi radical which shares two 
adjacent atoms with an ary! radical, wherein the terms "cycloalkyi" and "aryl" are as defined above, An example of an 
aryl-fused cycloalkyi radical is the benzofused cyclobutyi radical. 
5 (00631 The term "alkylcartjonylamino," alone or in combination, refers to a radical of fcwmula alkyl-CONH. wherein 
the term "alkyl" is as defined above. 

[0064] The term "alkoxycartx)nylamino." alone or in combination, refers to a radical of formula alkyl-OCONH-. 
wherein -the term "alkyi" is as defined above. 

[0065] The term "alkylsulfonylamino," alone or in combination, refers to a radical of formula alkyl-S02NH-, wherein 
10 the term "alkyI" is as defined ak>ove. 

[0066] The term "arylsulfonylamino," alone or in combination, refers to a radical of formula aryl-SOaNH-, wherein 
the term "aryl" is as defined above. 

[0067] The term "N-alkylurea," alone or in combination, refers to a radical of formula alkyl-NH-CXD-NH-, wherein the 
term "alky!" is as defined above. 
15 [0068] The term *N-arylurea," alone or in combination, refers to a radical of formula aryi-NH-CO-NH-. wherein the 
term *aryr is as defined above. 

[0069] The term "halogen" means fluorine, chlorine, bromine and iodine. 

[0070] The term "hydrocarbon radical" refers to an arrangement of carbon and hydrogen atoms which need only a 
single hydrogen atom to be an independent stable molecule. Thus, a hydrocartxsn radical has one open valence site on 
20 a carbon atom, tiirough which the hydrocartx^n radical may be bonded to other atom(s). AikyI, alkehyl, cydoalkyl. etc. 
are examples of hydrocarbon radicals. 

[0071] The term "hydrocarbon diradical** refers to an arrangement of carbon and hydrogen atoms which need two 
hydrogen atonrs in order to be an indeperKient stable molecule. Thus, a hydrocait)on radical has two open valence sites 
on one or two cart>on atoms, through which the hydrocartjon radical may be bonded to other atom(s). Alkylene. alke- 

25 nylene. alkynytene. cycloalkyi ene. etc. are examples of hydrocarbon diradicals. 

[0072] The term "hydrocarbyl" refers to any stable an-angemerrt consisting entirely of caitx)n and hydrogen having 
a single valence site to which it is bonded to another moiety, cuid thus includes radicals known as alkyt. aikenyt. alkynyl. 
cydoalkyl. cycloalkenyl. aryl (without heteroatom incorporation into ttie aryl ring), arylalkyf. alkylaryf and ttie like. Hydro- 
cartx>n radical is anotiier name for hydrocarbyl. 

30 [0073] The term "hydrocarbylene" refers to any stable anangement consisting entirely of carbon and hydrogen hav- 
ing two valence sites to which it is bonded to otiier moieties, and thus includes alkylene, alkenylene, alkynylene, 
cydoalkylene, cycloalkenylene. arylene (without heteroatom incorporation into the aryiene ring), arylalkylene, alky- 
larylene and the like. Hydrocartx)n diradical is another name for hydrocarbylene. 

[0074] The term "hydrocarbyl-O-hydrocarbylene" refers to a hydrocarbyl groip Ixnded to an oxygen atom, where 
35 the oxygen atom is likewise bonded to a hydrocarbylene group at one of the two valence sites at which the hydrocarb- 
ylene group is bonded to other moieties. The terms "hydrocaibyl-S-hydrocarbylene". "hydrocarbyl-NH-hydrocarbylene" 
and "hydrocarbyl-amide-hydrocarbylene" have equivalent meanings, where oxygen has been replaced with sulfur, -NH- 
or an amide group, respectively. 

[0075] The term N-(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene group wherein one of the two valence 
40 sites is tx^rxled to a nitrogen atom, and that nitrogen atom is simultaneously borKled to a hydrogen and a hydrocarbyl 

group. The term N.N-di(hydrocarbyl)hydrocarbylene refers to a hydrocarbylene group wherein one of ttie two valence 

sites is bonded to a nitrogen atom, and that nitrogen atom is simultaneously bonded to two hydrocarbyl groups. 

[0076] The term "hydrocarbylacyl-hydrocart^ylene" refers to a hydrocarbyl group bonded through an acyl (-C(=0)-) 

group to one of the two valence sites of a hydrocarbylene group. 
45 [0077] The terms "heterocyclylhydrocarbyl" and "heterocylyl" refer to a stable, cyclic an^ngement of atoms which 

include carbon atoms and up to tour atoms (referred to as heteroatoms) selected from oxygen, nitrogen, phosphorus 

and suff ur. The cydic arrangement may be in the form of a monocyclic ring of 3-7 atoms, or a bicydic ring of 8- 1 1 atoms. 

The rings may be saturated or unsaturated (including aromatic rings), and may optionally be benzofused. Nitrogen and 

sulfur atoms in the ring may be in any oxidized form, including the quaternized form of nitrogen. A heterocydylhydro- 
50 carbyl may be attached at any endocyclic cart)on or heteroatom which results in the creation of a stable structure. Pre- 

fenred heterocyclylhydrocarbyls indude 5-7 merhbered monocyclic heterocydes containing one or two nitrogen 

heteroatoms. 

[0078] A substituted heterocyclylhydrocarbyl refers to a heterocyclylhydrocarbyl as defined above, wherein at least 
one ring atom thereof is tjonded to an indicated substituent which extends off of the ring. 
55 [0079] In referring to hydrocarbyl and hydrocarbylene groups, the term "derivatives of any of the foregoing wherein 
one or more hydrogens is replaced witii an equal number of fluorides" refers to molecules ttiat contain cartx)n. hydrogen 
and fluoride atoms, but no other atoms. 

[0080] The term "activated ester" is an ester that contains a "leaving group" which is readily displaceable by a 
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nucleophile. such as an amine, an alcohol or a thiol nucleophile. Such leaving groups are welt known arvj include, with- 
out limitation. N-hydroxysuccinimide. N-hydroxybenzotriazoiep halogen (halides). alkoxy including tetrafluoropheno- 
lates. thioalkoxy and the like. The term "protected ester" refers to an ester group that is masked or othenfvise unreactive. 
See. e.g.. Greene. "Protecting Gro^js In Organic Synthesis.' 
5 [0081 ] In view of the above definitions, other chemical terms used throughout this application can be easily urKJer- 
stood by those of skill in the art. Terms may be used alone or in any combination thereof. The pretended and more pre- 
ferred chain lengths of the radicals apply to all such combinations. 

A. GENERATION OF TAGGED NUCLEIC ACID FRAGI^ENTS 

10 

[0082] As noted above, one aspect of the present Invention provides a general scheme for DNA sequencing vfhich 
allows the use of more than 1 6 tags in each lane; wmh continuous detection, the tags can be detected and the sequence 
read as the size separation is occurring, just as with conventional fluorescence-based sequencing. This scheme is 
applicable to any of the DNA sequencing techniques based on size separation of tagged molecules. Suitable tags and 
15 linkers for use within the present invention, as well as methods for sequencing nucleic adds, are discussed in more 
detail below. 

1 . Ta gs 

20 [0083] Tag", as used herein, generally refers to a chemical moiety which is used to uniquely identify a "molecule 
of interest", and more specifically refers to the tag variable component as well as whatever may be tx>nded most closely 
to it in any of the tag reactant tag component and tag moiety. 
[0084] A tag which is useful in the present invention possesses several attrSxites: 

25 1) It is capable of being distinguished from all other tags. This discrimination from other chemical moieties can be 

based on the chromatographic behavior of the tag (particularly after the deavage reaction), its spectroscopic or 
potentiometric properties, or some combination thereof. Spectroscopic methods by which tags are usefully distin- 
guished include mass spectroscopy (MS), infrared (IR). ultraviolet (UV). and fluorescence, where MS. IR and UV 
are preferred, and MS nrx>st preferred spectroscopic methods. Potentiometric amperometry is a preferred potenti- 

30 ometric method. 

2) The tag is capable of being detected when present at 10'^^ to 10'^ mole. 

3) The tag possesses a chemical handle through which it can be attached to the MOI which the tag is intended to 
uniquely identify. The attachment may be made directly to the MOI. or indirectly through a linker" group. 

4) The tag is chemically stable toward all manipulations to which it is sut^iected, induding attachment and deavage 
35 from the MOI, and any manipulations of the MOI while the tag is attached to it. 

5) The tag does not significantly interfere with the manipulations performed on the MOI while the tag is attached to 
it. For instance, if the tag is attached to an oligonucleotide, the tag must not significantly interfere with any hybridi- 
zatfon or enzymatic reactions (e.^., PGR sequendng reactions) performed on the oligonucleotide. Similarly, if the 
tag is attached to an antibody, it must not significantly interfere with antigen recognition by the antftxxty. 

40 

[0085] A tag moiety which is intended to be detected by a certain spectroscopic or potentiometric method should 
possess properties which enhance the sensitivity and specificity of detection by that method. Typically, the tag moiety 
will have those properties because they have t>een designed into the tag variable component, which will typically con- 
stitute the major portion of the tag moiety. In the following discussion, the use of the word '*tag** typically refers to the tag 

45 moiety (i.e., the cleavage product that contains the tag variable component), however can also be considered to refer 
to the tag variable component itself because that is the portion of the tag moiety whk:h is typically responsible for pro- 
viding the uniquely detectable properties. In compounds of the formula T-L-X. the T* portion will contain the tag variable 
component. Where the tag variable component has been designed to be characterized by, e.g. , mass spectrometry, the 
-T" portion of T-L-X may be referred to as T"®. Likewise, the cleavage product from T-L-X that contains T may be 

so refened to as the T^^-containing moiety. The following spectroscopic and potentiometric methods may be used to char- 
acterize T"®-containing moieties. 

a. Characteristics of MS Tags 

S5 [0086] Where a tag is analyzable by mass spectrometry (/.e.. is a MS-readable tag, also referred to herein as a MS 
tag or *T"^-containing mdety"). the essential feature of the tag is that it is able to be ionized. It is thus a preferred ele- 
ment in the design of MS-readable tags to incorporate therein a chemical functionality which can carry a positive or neg- 
ative charge under conditions of ionization in the MS. This feature confers irrproved efficiency of ion formatfon and 
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greater overall sensitivity of detection, particularly in eiectrospray ionization. The chemical functionality that supports an 
ionized charge may derive from T"® or L or both. Factors that can increase the relative sensitivity of an analyte being 
detected by mass spectrometry are discussed in, e.g.. Sunner. d. et al. Anal. Chem. 60:1300-1307 (1988). 
[0087] A preferred functionality to facilitate the can^ying of a negative charge is an organic acid, such as phenolic 
hydroxy], carboxylic acid, phosphonate, phosphate, tetrazole. sulfonyl urea, perfluoro alcohol and sulfonic acid. 
[0088] Preferred functionality to facilitate the carrying of a positive charge under ionization conditions are aliphatic 
or aromatic amines. Exanples of amine functional groups which give enhanced detectaUfity of MS tags include quater- 
nary amines (/.a. amines that have tour bonds, each to carbon atoms, see Aebersold. U.S. Patent No. 5.240,859) and 
tertiary amines (/.a. amines that have three bonds, each to carbon atoms, which includes C=N-C groups such as are 
present in pyridine, see Hess et al.. Anal. Biochem. 224:373, 1995; Bures et al., Anal. Bbchem. 224'ZSA, 1995). Hin- 
dered tertiary amines are particularly pretended. Tertiary and quaternary amines may be alkyi or aryl. A T^*-containing 
moiety must bear at least one lontzatsle species. iDut may possess more than one ionizable species. The preferred 
charge state is a single ionized species per tag. Accordingly, it is preferred that each T*"*-contalning moiety (and each 
tag variable component) contain only a single hindered amine or organic acid group. 

[0089] Suitable amine-containing radicals that may form part of the T^-containing moiety include the following: 
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[0090] The identification of a tag by mass spectrometry is pref eratDly based upon its molecular mass to charge ratio 
(tnfz). The preferred motecutar mass range of MS tags is from about 100 to 2.000 dartons, and preferably the T"^<on- 
taining moiety has a mass of at least about 250 daltons. more preferably at least at>out 300 daltons. and stilt more pref* 
eratsly at least about 350 daltons. It is generally difficult for mass spectrometers to distinguish among moieties having 
5 parent ions below about 200-250 daltons (depaiding on the precise instrument), and thus prefenred T^-containing 
moieties of the invention have masses above that range. 

[0091 ] As explained above, the T^^-containing moiety may contain atoms other than those present in the tag vari- 
able conponent. and indeed otiier than present in T"® itself Accordingly, the mass of T"^ itself may be less than about 
250 daltons, so long as the T^^-containing moiety has a mass of at least about 250 daltons. Thus, the mass of T"^ may 
10 range from 15 {i.e., a methyl radical) to about 10,000 daltons. and preferably ranges from 100 to about 5.000 daltons. 
and more preferably ranges from about 200 to atx)ut 1 .000 daltons 

[0092] It is relatively difficult to distinguish tags by mass specti'ometry when those tags incorporate atoms that have 
more than one isotope in significant abundance. Accordingly, preferred T groups which are interxJed for mass spectro- 
scopic identification (T"^ groups), contain carison, at least one of hydrog^ and fluoride, and optional atoms selected 
15 from oxygen, nitrogen, sulfur, phosphorus and iodine. While other atoms may be present in the T"®. their presence can 
render analysis of tiie mass spectral data somewhat more difficult. Preferably, the T^ groups have only cartxsn, nitro- 
gen and oxygen atoms, in euJdition to hydrogen and/or fluoride. 

[0093] Fluoride is an optional yet preferred atom to have in a T"^ group. In comparison to hydrogen, fluoride is, of' 
course, much heavier. Thus, the preserx;e of fluoride atoms rather than hydrogen atoms leads to T"® groups of higher 
20 mass, thereby allowing the T^^ group to reach and exceed a mass of greater than 250 daltons. which is desirable as 
explained above. In addition, the replacement of hydrogen with fluoride confers greater volatility on the T^-containing 
moiety, and greater volatility of the analyte enhances sensitivity when mass spectrometry is being used as the detection 
method. 

[0094] The molecular formula of T™* falls within tiie scope of C,:5ooNo.iooOo.iooSo-ioPo.ioHaFpl5 wherein the sum 
25 of a. p and 5 is sufficient to satisfy the otherwise unsatisfied valencies of the C. N. O. S and P atoms. The designation 
Ci-5ooNo.iooOo-iooSo-ioPo-ioHaPpU means that T"* contains at least one. and may contain any number from 1 to 500 
cartx>n atoms, in addition to optionally containing as many as 100 nitrogen atoms ("fMo.* means that need not con- 
tain any nitrogen atoms), and as many as 100 oxygen atoms, and as many as 10 sulfur atoms and as many as 10 phos- 
phorus atoms. The symbols a. p and 6 represent the number of hydrogen, fluoride and iodide atoms in 7™. where any 
30 two of these numbers may be zero, and where the sum of these numt)ers equals tie total of the otherwise unsatisfied 
valencies of the C. N. O. S and P atoms. Preferably. T^^ has a molecular formula that falls within the scope of C1.50N0. 
loOo-ioHaFp where the sum of a and p equals the number of hydrogen and fluoride atoms, respectively, present in the 
moiety. 

35 b. Characteristics of IR Tags 

[0095] There are two primary forms of IR detection of organic chemical groups: Raman scattering IR and absorp- 
tion IR. Raman scattering IR spectra and absorption IR spectra are complementary spectroscopic methods. In general. 
Raman excitation depends on t)ond polarizability changes whereas IR absorption depencfe on bond dipole moment 
40 changes. Weak IR absorption lines become strong Raman lines and vice versa. Wavenumber is the characteristic unit 
for IR spectra. There are 3 spectral regtons for IR tags which have separate applications: near IR at 12500 to 4000 cm* 
, mid IR at 4000 to 600 cm'\ far IR at 600 to 30 cm'^ For tiie uses described herein where a compourKi is to serve 
as a tag to identify an MOI, prot)e or primer, the mid spectral regions would be preferred. For example, the carbonyl 
stretch (1850 to 1750 cm*^) would be measured for cartx^xylic acids. cartx)xylic esters and amides, and alkyi and aryl 
45 caftx)nates. cari3amates and ketones. N-H bending (1750 to 160 cm'^) wouM be used to kientify amines, ammonium 
ions, and amides. At 1400 to 1250 cm*\ R-OH bending is detected as well as the C-N stretch in amides. Aromatic sub- 
stitution patterns are detected at 900 to 690 cm'"" (C-H bending. N-H bending for ArNHg). Saturated C-H. olefins, aro- 
matic rings, double arxl triple bonds, esters, acetals. ketals. ammonium salts, NO compounds such as oximes. nitro, 
N-oxides, and nitrates, azo. hydrazones. quinones, cartx>xylic acids, amides, and lactams ail possess vibrational infra- 
50 red correlation data (see Pretsch et al.. Spectral Data for Structure Determination of Organic Compounds, Springer- 
Veriag. New York. 1989). Preferred compounds woukl include an aromatic nitrile which exhibits a very strong nitnle 
st-etching vibration at 2230 to 2210 cm'V Other useful types of compounds are aromatic alkynes which have a strong 
stretching vibration that gives rise to a sharp absorption band between 2140 and 2100 cm'\ A third compound type is 
the aromatk; azkfes which exhibK an intense absorption band in the 2160 to 2120 cm*^ region. Thiocyanates are repre- 
ss sentative of compounds ttiat have a strong absorption at 2275 to 2263 cm''* . 
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c. Characteristics of UV Tags 

[0096] A conpiiation of organic chromophore types and their respective UV-visible properties is given in Scott 
{interpretation of ttie UV Spectra of Natural Products, Permagon Press, New York, 1 962). A chromophore is an atom 

5 or group of atoms or electrons that are responsible for the particular light absorption. Empirical rules exist for the n to 
ic* maxima in conjugated systems (see Pretsch et al.. Spectrai Data for Structure Determination of Organic 
Compounds, p. BBS and B70, Springer-Verlag, New York. 1989). Preferred compounds (with conjugated systems) 
woukj possess n to tt* and n to k* transitions. Such compounds are exemplified by Acid Violet 7. Acridine Orange. Acri- 
dine Yellow G. Brilliant Blue G. Congo Red. Crystal Violet. MalacNte Green oxalate, Metanil Yellow, Methylene Blue. 

10 Methyl Orange, Methyl Violet B. Naphtol Green B. Ofl Blue N, ai Red 0. 4-phenylazophenol, Safranie O, Solvent Greeri 
3, and Sudan Orange G. all of which are commercially available (Aldrich. Milwaukee, Wl). Other suitable corrpounds 
are listed in, e.g.. Jane. I,, et al., J. Chrom. 323:191-225 (1985). 

d. Cttaracteristic of a Fluorescent Tag 

15 

[0097) Ruorescent probes are identified and quantitated most directly by their absorption and fluorescence emis- 
sion wavelengths and intensities. Emission spectra (fluorescence and phosphorescence) are much more sensitive arxl 
permit more specific measurements than absorption spectra. Other photophysical characteristics such as excited-state 
lifetime and fluorescence anisotropy are less widely used. The most generally useful intensity parameters are the molar 

20 extinction coefficient (e) for absorption and the quantum yield (QY) for fluorescence. The value of e is specified at a sin- 
gle wavefengtii (usually the absorption maximum of ttie probe), whereas QY is a measure of the total photon emission 
over the entire fluorescence spectral profile. A narrow optical bandwidth (<20 nm) is usually used for fluorescence exci- 
tation (via absorption), whereas the fluorescence detection bandwidth is much more variable, ranging from full spec- 
trum tor maximal sensitivity to narrow band (-20 nm) for maximal resolution. Fluoresc«ice intensity per probe molecule 

25 is proportional to the product of c and QY The range of tiiese parameters among f luorophores of current practical 
inportance is approximately 10.000 to 100,000 cm^^M'^ for c and 0.1 to 1.0 for QY Compourxls that can serve as flu- 
orescent tags are as follows: fluorescein, rhodamine. lambda blue 470. lambda green, lambda red 664. lambda red 665. 
acridine orange, and propidium iodide, which are commerdatly available from Lambda Ruorescence Co. (Pleasant 
Gap, PA). Ruorescent conrpounds such as nile red. Texas Red, lissantine™, BODIPV** s are available from Molecular 

30 Probes (Eugene. OR), 

e. Chiaracteristics of Potentiometric Tags 

[0098] The principle of electrochemical detection (ECD) is based on oxidation or reduction of compounds which at 
35 certain applied voltages, electrons are either donated or accepted thus producing a current which can be measured. 
When certain compounds are subjected to a potential difference, the molecules undergo a molecular rearrangement at 
the working electrodes' surface with the loss (oxidation) or gain (reduction) of electrons, such compounds are said to 
be electronic and undergo electrochemical reactions. EC detectors apply a voltage at an electrode surfece over which 
the HPLC efuent flows. Electroactive compounds eluting from the column either donate electrons (oxidize) or acquire 
40 electrons (reduce) generating a current peak in real time. Inportantiy the amount of current generated depends on both 
the concentration of the analyte and the voltage applied, with each conpound having a specific voltage at which it 
begins to oxidize or reduce. The cun-entiy most popular electrochemical detector is the amperometric detector in which 
the potential is kept constant and the cun^ent produced from tiie electrochemical reaction is then measured. This type 
of spectrometry is currentiy called "potentiostatic amperometry". Commercial amperemeters are available from ESA, 
45 Inc.. Chelnrrfbrd. MA. 

[0099] When the efficiency of detection is 100%. ttie specialized detectors are tentied "coulometric". Coulometrte 
detectors are sensitive which have a number of practical advantages with regard to selectivity and sensitivity which 
make these types of detectors useful in an array. In coulometric detectors, for a given concentration of analyte. the sig- 
nal cun-ent is plotted as a function of the applied potential (voltage) to the working electi^ode. The resultant slgmoidal 
so graph is called tiie current-voitage curve or hydrodynamic voltammagram (HDV). The HDV allows the best choice of 
applied potential to the working electrode that permits one to maximize the observed signal. A major advantage of ECD 
is its inherent sensitivity with current levels of detection in the subfemtomole range. 

[0100] Numerous chemicals and compounds are electrochemically active including many biochemicals, pharma- 
ceuticals and pesticides. Chromatographically coeluting compounds can be effectively resolved even if their half-wave 
55 potentials (tiie potential at half signal maximum) differ by only 30-60 mV. 

[01011 Recentiy developed coulometric sensors provide selectivity, identification and resolution of co-eluting com- 
pounds when used as detectors in liquid chromatography based separatbns. Therefore, these anayed detectors add 
another set of separations accomplished in the detector itself. Current instruments possess 16 channels which are in 



16 



EP0 992 511 A1 

principle limited only by the rate at which data can be acquired. The number of compounds which can be resolved on 
the EC array is chromatographically limited (i.e.. plate count limited). However, if two or more compounds that chroma- 
tographically co-elute have a difference in half wave potentials of 30-60 mV. the array is able to distinguish the com- 
pounds. The ability of a compound to be electrochemically active relies on the possession of an EC active group (i.e., - 
5 OH, -O. -N. -S). 

[0102] Compounds which have been successfully detected using coutometric detectors include 5-hydroxytryp- 
tamine. 3-methoxy-4-hydroxyphenyl-gIycol, homogentisic acid, dopamine, metanephrine. 3-hydroxykynureninr, ace- 
tominophen, 3-hydroxytryptopho!. 5-hydroxyindoteacetic acid, octanesuifonic actd. phenol, o-cresol. pyrogallol, 2- 
nrtrophenol. 4-nitrophenol. 2,4-dinitrophenol. 4.6-dinitrocresol. 3-methyl-2-nitrophenol. 2.4-dichlorophenol, 2.6-dichlo- 

w rophenol, 2,4,5-trichlorophenol, 4-chloro-3-methylphenol. 5-methylphenol. 4-methyl-2-nitrophenol. 2-hydrQxyaniline, 4- 
hydroxyaniline. 1 ,2-phenylenediamine. benzocatechin. buturon. chlortholuron. diuroa Isoproturon, linuron. methobro- 
muron. metoxuron. monolinuron. monuron. methionine, tryptophan, tyrosine, 4-aminobenzoic acid, 4-hydroxybenzoic 
acrd, 4-hydroxycoumarlc acid. 7-methoxycoumarin. apigenin baicalein. caffeic acid, catechin. centaurein, chlorogenic 
acid, daidzein, datiscetln. diosmetin, epicatechin galiate. epigafio catechin. epigallo catechin gattate. eugenol. eupa- 

15 torin. ferulic acid, fisetin. galangin, gallic add. gardenin. genistein. gentisic acid, hesperidin. trigenin, kaemferol. leucoy- 
anidin, luteolin. mangostin. morin. myricetin. naringin. narirutin. pelargondin. peonidin. phloretin, pratensein, 
protocatechuic acid, rhamnettn. quercetin. sakuranetin. scutellarein. scopdetin. syringaldehyde. syringic add, tangeri- 
tin, troxerutin. umbelitferone. vanillic acid. 1.3-dimethyl tetrahydroisoquindine, G-hydroxydopamine. r-salsolinol, N- 
methyl-r-salsolinol. tetrahydroisoquinoline. amitrtptyline. apomorphine. capsaicin, chlordiazepoxide. chlorpromazine. 

20 daunorubidn. desipramtne. doxepin. fluoxetine, flurauepam. imipramine. isoproterenol, methoxamine. morphine, mor- 
phine-3-glucuronide. nortriptyline, oxazepam, phenylephrine, trimipramine. ascorbic acid. N-acetyl serotonin. 3.4-dihy- 
droxybenzylamine. 3.4-dihydraxymandelic add (DOMA), 3.4-dihydroxyphenylacetic add (DOPAC). 3,4- 
dihydroxyphenylalsuiine (L-DOPA). 3. 4-dihydroxyphenyf glycol (DHPG), 3-hydroxyanthranilic add, 2-hydrQxyphenylace- 
tic acid (2HPAC), 4-hydroxyt»en2oic add (4HBAC), 5-hydroxyindde-3-acetic acid (5HIAA). 3-hydroxykynurenine. 3- 

25 hydroxymandelic add, 3-hydroxy-4-methoxyphenylethylamine. 4-hydroxyphenylacetic add (4HPAC). 4-hydroxyphenyl- 
lactic add (4HPLA). 5-hydroxytryptophan (5HTP), 5-hydraxytryptophol (5HTOL). 5-hydroxytryptamtne (5HT). 5-hydn»c- 
ytryptamine sulfate. 3-methoxy-4-hydroxyphenylglycd (MHPG). 5-methoxytryptamine. 5-methoxytryptophan. 5- 
methoxytryptophol. 3-methoxytyramine (3MT). 3-methoxytyrosine {3-OM-DOPA). 5-methylcysteine. 3-methylguanine. 
bufotenin. dopamine dopamine-3-glucuronide. dopamine-3-sulfate. dopamine-4-sulfate. epinephrine, epinine. folio 

30 acid, glutathione (reduced), guanine, guanosine. homogentisic add (HGA). homovaniilic acid (HVA). homovanillyl alco- 
hol (HVOL). homoveratic acid, hva sulfate, hypoxanthine. indole, indde-3-acetic acid, irKlde-3-lactic add. kynurenine. 
melatonin, metanephrine. N-methyltryptamine. N-methyltyramine, N.N-dimethy!tryptamine. N.N-dimethyttyramine. 
norepinephrine, normetanephrine. octopamine. pyrldoxal. pyridoxal phosphate, pyridoxamine. synephrine. tryptophol. 
tryptamine. tyramine. uric acid, vanillylmandelic acid (vma). xanthine and xanthosine. Other suitable compourxjs are set 

35 forth in. e.g., Jane. I., et aL J. Chrom. 323:191-225 (1985) and Musch. a. et al.. J. Chrom, 348:97-110 (1985). These 
compounds can be incorporated into compounds oif formula T-L-X by methods known in the art For example, com- 
pounds having a cait)oxylic ackJ group may be reacted with amine, hydroxyl, etc. to form amide, ester and other link- 
ages between T and L. 

[0103] In addition to the above properties, and regardless of the intended detection method, it is preferred that the 
40 tag have a modular chemical structure. This aids in the construction of large numbers of structurally related tags using 
the techniques d combinatorial chemistry. For example, the T^ group desirably has several properties. It desirably 
comains a functional group which supports a single ionized charge state when the T"®-containing moiety is subjected 
to mass spectrometry (more simply referred to as a "mass spec sensitivity enhancer" group, or MSSE). Also, it desirably 
can serve as one member in a family of T^^-containing moieties, where members of the family each have a different 
45 mass/charge ratio, however have approximately the same sensitivity in the mass spectrometer. Thus, the members of 
the family desirably have the same MSSE. In order to allow the creation d families d conpounds. it has been found 
convenient to generate tag reactants via a modular synthesis scheme, so that the tag components themselves may be 
viewed as comprising modules. 

[0104] In a preferred modular approach to the structure of the T^ group, has the formula 

so 

T2-{J-T3-)n- 

wherein T^ is an organic moiety formed from carbon and one or more d hydrogen, fluoride, iodide, oxygen, nitrogen, 
sulfur and phosphorus, having a mass range d 15 to 500 daltons; T^ is an organic mdety formed from cartx)n arnj one 
55 or more of hydrogen, fluoride, iodide, oxygen, nitrogen, sulfur and phosphorus, having a mass range of 50 to 1000 dal- 
tons; J is a direct bond or a functional group such as amide, ester, amine, sulfide, ether, thioester. disulfide, thioether, 
urea, thiourea, carbamate, thiocart^mate. Schiff base, reduced Schiff base, imine. oxime. hydrazone. phosphate, 
phosphonate. phosphoramide. phosphonamide. sulfonate, sulfonamide or carbon-cartDon bond; and n is an integer 
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ranging from 1 to 50. such that when n is greater than 1. each and J is independently selected. 
[01051 The modular structure t2-(J-T^)„. provides a convenient entry to ^milies of T-L-X compounds, where each 
member of the family has a different T group. For instance, when T is T"*. and each family member desiral)ly has the 
same MSSE. one of the groups can provide that MSSE structure. In order to provfcle variability between members 
of a family in terms of the mass of T"^, the group may be varied among family members. For instance, one family 
member may have = methyl, while another has = ethyl, and another has = propyl, etc. 
101061 In order to provide "gross" or large jumps in mass, a group may be designed which adds significant (e.g., 
one or several hundreds) of mass units to T-L-X. Such a group may be referred to as a molecufar weight range 
adjuster group("WRA"). A WRA is quite useful if one is working with a single set of groups, which will have masses 
extending over a limited range. A single set of groups may be used to create T"* groups having a wide range of mass 
simply by incorporating one or more WRA groups into the T™. Thus, using a sirrple example, if a set of groi^ 
affords a mass range of 250-340 daJtons for the r~. the addition of a single WRA, hawig. as an exemplary number 
1 00 dalton. as a group provides access to the mass range of 350-440 daltons while using the same set of groups. 
Similarly, the addition of two 100 dalton MWA groups (each as a group) provides access to the mass range of 450- 
540 daltons. where this incremental addition of WRA groups can be continued to provide access to a very large mass 
range for the T"^ group. Preferred compounds of the formula t2.(J.t3-)„.L-X have the fomuifa Bvwc-(Rwra)w-Rmsse- 
L-X where VWC is a T^" group, and each of the WRA and MSSE groups are "T^" groups. This structure is illustrated 
in Figure 13. and represents one modular approach to the preparation of T^^ 

[01071 In the formula t2-(J-t3-)„.. 7^ and T^ are preferably selected from hydrocarbyl. hydrocarbylO-hydrocarb- 
ylene, hydrocarbyl-S-hydrocaibylene. hydrocarbyl-NH-hydrocarbylene. hydrocarbyi-amide-hydrocaibylene, N-(hydro- 
cartDyl)hydrocarbylene, N.N-di (hydrocarbyl) hydrocarbylene. hydrocarbylacyf hydrocarbyl ene, heterocydylhydrocarbyl 
wherein the heteroatom(s) are selected from oxygen, nitrogen, sulfur and phosphorus, substituted heterocydylhydro- 
carbyl wherein the heteroatom(s) are selected from oxygen, nitrogen, suHur and phosphorus and the substituents are 
selected from hydrocarbyl. hydrocarbyl-O-hydrocarbylene. hydrocairbyl-NH-hydrocarbylene. hydrocarbyl-S.hydrocart>- 
ylene, N-^rocarbyl)hydrocarbylene. N.N-di(hydrocarbyl)hydrocarbylene and hydrocarbylacyl -hydrocarbylene. In 
acWition. T^ and/or T^ may be a derivative of any of the previously listed potential T^ / groups, such that one or more 
hydrogens are replaced fluorides. 

101081 Also regarding the formula t2.(J-t3-)„-. a prefen-ed 7^ has the formufa -G(R2).. wherein G is alkylene 
chain fwing a single R^ substituent. Thus, if Q is ethylene {-CH2-CH2-) either one of the two etiiylene cartoons may 
have a R^ substituent. and R^ is selected from alkyl. alkenyl. alkynyl. cycloalkyl. aryi-fused cycloalkyl. cydoalkenyl. aryl, 
aralkyl. aryl-substituted alkenyl or alkynyl. cycloalkyl-substituted alkyl, cydoalkenyl-substituted cydoalkyi, biaryl, alkoxy' 
alkenoxy. alkynoxy. aralkoxy. aryl-substituted alkenoxy or alkynoxy. alkylamino. alkenylamino or alkynylariiino. aryl-sub- 
stituted alkylamino. aryl-substituted alkenylamino or alkynylamino, aryloxy, arylantino. N-alkylurea-substituted alkyl. N- 
arylurea-substituted alkyl. alkylcarbonyiamino-substltuted alkyl, aminocarbonyl-substituted alkyl. heterocyclyl. hetero- 
cydyl-substituted alkyl, heterocydyl-substituted amino, carboxyalkyi substituted aralkyl. oxocarbocydyl-fused aryl and 
heterocydylalkyl; cydoalkenyl. aryl-substituted alkyl and. aralkyl. hydroxy-suljstituted alkyl. alkoxy-substituted alkyl 
aralkoxy-substituted alkyl, alkoxy-substituted alkyl. aralkoxy-substituted alkyl. amino-substituted alkyl. (aryl-substituted 
alkylQxycarbonylamino)-substituted alkyl. thiol-substituled alkyl. alkylsuWonyl-substituted alkyl. (hydroxy-substituted 
alkylthio)-substituted alkyl, thioalkoxy-substituted alkyl. hydrocarbylacylamino-substituted alkyl, heterocydylacylamino- 
substituted alkyl. hydrocarbyl-substituted-heterocyclylacylamino-SLdjstituted alkyl. alkylsuHbnylamino-substituted alM, 
arylsulfonylamino-substituted alkyl. morpholino-alkyt, tiiiomorpholino-alkyl. morpholino cart>onyl^ubstituled alkyl. thio^ 
morpholinocarbonyl-substituted alkyl, IN-(alkyl, alkenyl or alkynyl)- or N.N-tdialkyI, dialkenyl, dialkynyl or (alkyl, alkenyl)- 
aminolcarbonyl-substituted alkyl. heterocyclylaminocarbonyl. heterocylylalkyleneaminocarbonyl, heterocydyiaminoc- 
arbonyl-substituted alkyl. heterocylylalkyleneaminocaitDonyl-substituted alkyl, N.N-[dialkyl]alkyleneaminocartx)nyl, 
N.N-[dialkyl]alkyleneaminocarbonyl-substituted alkyl. alkyi-substituled heterocyclylcarbonyl, alkyl-substituted hetero- 
cydylcarbonyl-alkyl. carboxyl-substituted alkyl. dialkylamino-substituted acylaminoalkyi and amino add side chains 
selected from arginine. asparagine, glutamine. S-methyl cysteine, methionine and conesponding sulfoxide and sulfone 
derivatives thereof, gfydne, leucine, isoleucine. allo-isoleudne. tert-leucine. norleucine. phenylalanine, tyrosine, tryp- 
tophan, proline, alanine, ornithine, histidine, glutamine. valine, threonine, serine, aspartic add, beta-cyanoafanine, and 
allothreonine; alynyl and heterocyclylcartwnyl. aminocarbonyl. amido. mono- or dialkylaminocartjonyl, mono- or'diar- 
ylaminocarbonyl. alkylarylaminocarbonyl. diarylaminocarboriyl. mono- or diacylaminocart)onyl. aromatic or aliphatic 
acyl. alkyl optionally substituted by substituents selected from amino, carboxy. hydroxy, mercapto. mono- or 
dialkylamino. mono- or diarylamino. aikytarylamino. diarylamino. mono- or diacylamino. alkoxy. alkenoxy, aryloxy. thio- 
alkoxy. thioalkenoxy. thioalkynoxy, tiiioaryloxy and heterocyclyl. 
[01 091 A preferred compound of the formula T2-{j.T^-)n-L-X has the structure: 
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O (CH2)c 



wherein G is {CH^it^ such that a hydrogen on one arxi only one of the CH2 groups represented by a single 'G' is 
15 replaced with-(CH2)c-Ainide-T^: and are organic moieties of the formula Ci.2sNo.90o.9HaF|5 such that the sum of 
a and p is sufficient to satisfy the otherwise unsatisfied valencies of the C, N, and O atoms; amide is 

O 
II 

or — C-N— : 



25 is hydrogen or Ct.io alkyi; c is an integer ranging from 0 to 4; and n Is an integer ranging from 1 to SO such that when 
n is greater than 1 . G. c Amide. and are independently selected. 

[0110] In a further preferred embodiment, a compound of the formula T^*(JT^')n^-X has the structure: 
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wherein is an organic moiety of the formula Ci.25No.90o.9HjjFp such that the sum of a and p is sufficient to satisfy 
43 the otherwise unsatisfied valencies of the C, N, and O atoms; and includes a tertiary or quaternary amine or an 
organic add; m is an integer ranging from 0-49, and T^, T*. R\ L and X have been previously defined. 
[01 11 ] Another preferred compound having the formula T^-(J-T^-)n-L-X has the particular structure: 
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15 Wherein ,s an organic moiety of the formula C1.25No.90o.9H„F,^ such that the sum of a and p is suffident to satisfy 
the otherwise unsatisfied valencies of the C. N. and O atoms: and includes a tertiary or quaternary amine or an 
organic acid: m is an integer ranging from 0-49. and T^. T*, c. R\ "Amide". L and X have been previously defined. 
[01 1 21 In the above structures that have a group, -Amide-T^ is preferably one of the following, which are conven- 
iently made by reacting organic adds with free amino groups extending from "G": 





<C,-C,o) 

~NHC-(C,-C,o)-N^ y ; --NHC-(Co-C,o)-^^ . 




■"^"n \ ^io)' and — NHC— (C,--C,o)-N- 



[0113] Where the above compounds have a group, and the "Q" group has a tree cartxwyl group (or reactive 
equivalent thereof, then the following are preferred -Amide-T^ grotp. which may conveniently be prepared by reacting 
^0 the appropriate organic amine with a free carboxyl group extending from a "Q" group: 
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— CNH-(C,-C,o)--^23^ ; -CNH~(C,-C,o)— 

N— X 

— CNH-(C,-C,o)-H^^^ . — CNH-(Cr^C,o>-N^^ 

— CNH--(C2— C,o>-N^ / ' — ?NH— (C— C,o) f^^ 

— CNH-(C2— C,o)-N(CrC,o)2 ; — CNH-(C2— C,o)-N^ 

— CN^ V(Ci— C,o) ; and 
O 



[01 1 4] In three preferred embodiments of the invention. T-L-MOt has the structure: 

Anade | 
O CHzX r' ^ ^(C— Co)— ODN-3— OH 





or the structure: 
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so 



or the structure: 
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(C i~C ,o)— ODN -3 — OH 



wherein and are organic moieties of the formula CLggNo-gOo-gSo-aPo-aHaFpIg such that the sum of o. B. and fi is 
suffwaent to satisfy the otherwise unsatisfied valencies of the C. N. O. S and P atoms: G is (CHs),^ wherein one and 
35 only one hydrogen on the CHg groups represented by each Q is replaced with -(CH2)c-Amide-T*; Amide is 
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or 
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R is hydrogen or C^.^q alM: c is an integer ranging from 0 to 4; "Cg-Cio" represents a hydrocarbylene group having 
from 2 to 10 carbon atoms. "ODN-3'-OH- represents a nucleic acid fragment having a terminal 3* hydroxyl group (/.e. . 
a nucleic add fragment joined to (CrCio) at other than the 3' end of the nucleic acid fragment); and n is an integer rang- 
ing from 1 to 50 such that when n is greater than 1 , then G. c. Amide, and are independently selected. Preferably 
there are not three heteroatoms bonded to a single carbon atom, wherein and r* are organic moieties of the formula 
Ci-ssNo-gOo-gH^Fp such that the sum of a and 3 is sufficient to satisfy the otheravrse unsatisfied valencies of the C, N, 
and O atoms: G is {CH^^.s wherein one and only one hydrogen on the GH2 groups represented by each G is replaced 
with-(CH2)c-Amide-T^; Amide is 
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is hydrogen or C^.^o ^M: c is an integer ranging from 0 to 4; "ODN-S'-OH" represents a nucleic acid fragment hav- 
ing a terminal 3' hydroxy! group: and n is an integer ranging from 1 to 50 such that when n is greater than 1 , Q. c, Amide. 

and are independently selected. 
[01 1 5] In structures as set forth above that contain a t2-C(=0)-N(R^ )- group, this group may be formed by reacting 

5 an amine of the formula HN{R^)- with an organic add selected from the following, which are exennplary only and do not 
constitute an exhaustive list of potential organic acids: Formic add. Acetic acid. Propiolic acid. Propionic acid, Fluoro- 
acetic acid, 2-Butynoic acid. Cydopropanecarboxylic acid. Butyric add, Methoxyacetic acid, Difluoroacetic acid. 4-Pen- 
tynoic acid, Cyclobutanecartjoxylic acid. 3,3-Dimethylacrylic acid. Valerie acid, N.N-Dimethylglycine. N-Formyl-Gly-OH, 
Ethoxyacetic add. (Methylthio)acetic acid. Pyrrole-2-cari3oxylic acid, 3-Furoic acid, lsoxazole-5-cart30xylic acid, trans- 
10 3-Hexenotc add, Trrfluoroacetic acid, Hexanoic acid, Ac-Gly-OH, 2-Hydroxy-2-methybutyric acid. Benzoic acid. Nico- 
tinic acid, 2-Pyra2inecafboxylic add. l-Methyl-2iDynrolecarboxylic acid, 2-Cyclopentene-l -acetic acid. Cyclopentylace- 
tic add. (S)-(-)-2-Pynrolidone-5-carboxylic add. N-Methyl-L-proline. Heptanoic acid, Ac-b-Ala-OH. 2-Ethyl-2- 
hydroxytHJtyric add, 2-(2-Methoxyethoxy)acetlc acid. p-Toluic add. 6-Methylnicotinic acid, 5-Methyl-2-pyrazinecart30xy- 
lic add, 2.5-Dimethytpyrrole-3-carboxylic acid, 4-Fluorobenzoic acid, 3,5-Dimethytisoxazole-4-carbGKyiic acid, 3- 

75 Cydopentyipropionic add, Octanoic add. N,N-Dimethylsuccinamic add, Phenylpropidic add, Cinnamic acid. 4-Ethyl- 
benzoic add, p-Anisic acid, 1.2,5-Trimethylpyrrole-3-carboxylic acid, 3-Fluoro-4-methylbenzoic acid. Ac-DL-Propar- 
gylglycine. 3-(Trifluoromethyl)butyrlc acid. 1-Piperidinepropionic add. N-Acetylprdine. 3,5-Difluorobenzoic acid. Ac-L- 
Vai-OH. lndole-2-cartX3Kyllc acid, 2-Benzofurancartx)xylic add. Benzotriazole-5-cart30xytic add, 4-n-Propylbenzdc 
acid, 3-Dimethylaminobenzoic acid. 4-Ethoxybenzoic acid, 4-(Methylthio)benzoic add. N-(2-Furoyl)glycine, 2-(Methyl- 

20 thio)nicotinic acid. 3-Ruoro-4-methoxyt>enzoic acid, Tfa-Gly-OH. 2-Napthoic acid. Quinaldic acid. Act-L-lle-OH, 3- 
Methylindene-2-carboxylic acid. 2-Quinoxalinecait>oxyiic acid. 1-Methylindole-2^rtx)xylic acid. 2,3.6-TrTfluorobenzdc 
acid, N-Fbrmyl-L-Met-OH. 2-[2-(2-Methoxyethoxy)ethoxy]acelicacid. 4-n-Butylbenzoic add. N-Benzoylglydne, 5-Fluor. 
oindole-2-caiboxyllc add. 4-n-Propoxybenzoic acid, 4-Acetyl-3.5<iimethyl-2-pyrrolecarboxytic acid, 3.5-Dimethoxyt>en- 
zoic add, 2.6-Dimethoxynicotinic add, Cyclohexanepentanoic acid. 2-Naphthylac6tic acid, 4-{1H-Pyn'ol-1-yl)benzdc 

25 acid, lndole-3-propionic add. m-Trifluoromethylbenzoic acid. 5-Me1hoxyindole-2-carbO)(yfic acid, 4-Pentylbenzoic add. 
Bz-b-Ala-OH. 4-Diethylamlnobenzoic acid. 4-n-Butoxybenzoic acid. 3-Methyl-S-CF3-isoxazole-4-cart)oxylic add, (3,4- 
DimethoxyphenyQacetic acid. 4-Biphenylcarboxylic acid. Pivaloyl-Pro-OH. Octanoyl-Gly-OH, (2-Naphthoxy)acetic acid, 
lndole-3-butyric acid. 4-(Trtf luoromethyOphenylacetic add. 5-MethQxytndole-3-acetic acid. 4-nnf hJoromethQxy)benzdc 
acid. Ac-L-Phe-OH. 4-Pentyloxybenzoic add, Z-Qly-OH. 4-Carboxy-N-(fur-2-ylmethy!)pyrrolidin-2-one. 3,4-Diethoxy- 

30 benzoic add. 2.4-Dimethyl-5-C02Eti3yn'ole-3-carboxylic add. N-(2-Fruorophenyl)succinamic acid. 3.4,5-Trimethoxy- 
benzoic add, N-Phenylanthranillc acid. 3-Phenoxybenzoic acid. Nonanoyl-Gly-OH. 2-Phenoxypyridine-3-cartx)xylic 
acid, 2,5-Dimethyl-1-phenylpyrrole-3-caitx>xylic acid. trans-4-(Trifluoromethy1)cinnamic acid, (5-Methyl-2-phenyloxazol- 
4-yf)acetic acid, 4-(2-Cydohexenyloxy)benzoic acid. 5-Methoxy-2-methylindole-3-acetic add, trans-4-Cotlninecarboxy- 
lie acid. Bz-5-Aminovaleric acid. 4-HexyIoxybenzoic acid. N-(3-Methoxyphenyl)sucdnamic acid, Z-Sar-OH, 4-(3,4- 

35 Dimethoxyphenyl)butyric add. Ac-o-Fluoro-DL-Phe-OH. N-(4-Ruorophenyl)glutaramic acid. 4'-Ethyl-4-biphenylcartDox- 
ylic acid. 1,2.3.4-Tetrahydroacridinecartx)xylic acid. 3-Phenoxyphenylacetic acid. N-(2.4-Difluorophenyl)succinamic 
acid. N-Decanoyl-Gly-OH, (-i-)-6-MethQxy-a-methyl-2-naphthaleneacetic add, 3-(Trifluoromethoxy)dnnamic acid, N- 
Formyl-DL-Trp-OH. (R)-(+)-a-Methoxy-a-(trif!uoromethyl)phenylacetic add, Bz-DL-Leu-OH. 4-(Trifluoromethoxy)phe- 
noxyacetic acid, 4-Heptyloxybenzoic add, 2.3,4-Trimethoxycinnamic acid, 2,6-Dimethoxybenzoyl-Gly-OH, 3-(3.4.5-Tri- 

40 methoxyphenyl)propionic acid, 2.3.4,5,6-Pentafluorophenoxyacetic acid, N-(2,4-Drfluorophenyl)glutaramic add. N- 
Undecanoyl-Gly-OH. 2-(4-Fluorobenzoyl)benzoic add, 5-TrifIuoromethoxyindole-2-carboxylic add. N-{2.4-Difluorophe- 
nyl)diglyco!anric acid, Ac-L-Trp-OH. Tfa-L-Phenylglyclne-OH, 3-lodobenzoic acid, 3-(4-n-PentyIbenzpyl)propionic add. 
2-Phenyl-4-quinolinecarbQxyltc acid. 4-Octyioxybenzoic add, Bz-L-Met-OH. 3.4,5*Triethoxybenzoic add, N-Lauroyt- 
GfyOH. 3.5-Bis(trifluoromethyl)benzoic add. Ac-5-Methyl-DL-Trp-OH, 2-lodophenylacetic acid. 3-lodo-4-methylben- 

45 zoic add. 3-{4-n-Hexylbenzoyl)propionic acid. N-Hexanoyl-L-Phe-OH, 4-Nonyloxybenzotc acid. 4*-(Trifluoromethyl}-2- 
biphenylcartx3xylic add, Bz-L-Phe-OH, N-Tridecanoyl-Gly-OH. 3.5-Bis(trifluoromethyl)phenylacetic add, 3-(4-n-Heptyl- 
benzoyl)propiortic acid. N-Hepytanoyl-L-Phe-OH. 4-Decyloxybenzoic acid. N-(a,a.a-trifIuoro-m-tolyl)anthranilic add. 
Niflumic add. 4-(2-l-lydroxyhe>(afluoroisopropyl)benzoic acid, N-Myristoyl-Gly-OH. 3-(4-n-OctyIbenzoyl)propionic acid, 
N-Octanoyl-L-Phe-OH, 4-Undecyloxybenzoic acid. 3-(3,4.5-Trimethoxyphenyl)propionyl-Gly-OH, 8-lodonaphthoic acid, 

so N-Pentadecanoyl-GlyOH, 4<Dodecyloxybenzoic acid. N-Palmitoyl-Gly-OH, and N-Stearoyl-Gly-OH. These organic 
acids are available from one or more of Advanced ChemTech. Louisville, KY; Bachem Bioscience Inc., Ton-ance. CA; 
Calbiochem-Novabiochem Corp.. San Diego. CA; Farchan Laboratories Inc.. Gainesville FL; Lancaster Synthesis. 
Windham NH; and MayBridge Chemical Corrpany (c/o Ryan Scientific). Columbia, SC. The catalogs from these com- 
panies use the abreviations which are used above to identify the acids. 

55 

f . Combinatorial Chemistry as a Means for Preparing Tags 

[01 1 6] Combinatorial chemistry is a type of synthetic strategy which leads to the production of large chemical librar- 
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ies (see. for example. PCT Application Publication No. WO 94/08051). These combinatorial libraries can be used as 
tags lor the identification of molecules of interest (MOIs). Combinatorial chemistry may be defined as the systematic 
and repetitive, covalent connection of a set of different "building blocks" of varying structures to each other to yield a 
large array of diverse molecular entities. Building blocks can take many forms, both naturally occurring and synthetic, 
such as nucleophiles. electrophiles, dienes, alkylating or acylating agents, diamines, rujcleotides, amino acids, sugars! 
lipids, organic monomers, synthons. and combinations of the above. Chemical reactions used to connect the building 
blocks may involve alkylation. acylation. oxidation, reduction, hydrolysis, substitution, elimination, addition, cyclization, 
condensation, and the like. This process can produce libraries of compounds which are oligomeric, non-oligomeric. or 
combinations thereof If oligomeric. the compounds can be branched, unbranched. or cyclic. Examples of oligomeric 
stnjctures which can be prepared by combinatorial methods include oligopeptides, oligonucleotides, oligosaccharides, 
pdylipids, polyesters, polyamides, polyurethanes, polyureas, polyethers, poly(phosphorus derivatives), e.g., phos- 
phates, phosphonates, phosphoramides. phosphonamides, phosphites, phosphinamides. etc., and poIy(sulfur deriva- 
tives), e.g.. sulfbnes. sulfonates, sulfites, sulfonamides, sulfenamides. etc. 

[01 1 7] One common type of oligomeric combinatorial Itorary is the peptide combinatorial library. Recent innovations 
in peptide chemistry and molecular biology have enabled libraries consisting of tens to hundreds of milltons off different 
peptide sequences to be prepared and used. Such libraries can be divided into three broad categories. One category 
of libraries involves the chemical synthesis of soluble non-support-tx>und peptide libraries {e.g. . Houghten et al.. Nature 
354:84, 1991). A second category involves ttie chemical synthesis of support-bound peptide r±>raries. presented on 
solid supports such as plastic pins, resin beads, or cotton (Geysen et al., MoL Immunol. 23:709. 1986; Lam el al.. 
Nature 354:82, 1991 ; Eichler and Houghten. Biochemistry 32:A 1035, 1993). In these first two categories, the building 
blocks are typically L-amino adds. D-amino acids, unnatural amino acids, or some mixture or combination thereof. A 
third category uses molecular biology approaches to prepare peptides or proteins on ttie surface of filamentous phage 
particles or plasmlds (Scott and Craig. Curr. Opinior) Biotech. 5:40, 1994). Soluble, nonstpport-bound peptide Sbraries 
appear to be suitable for a number of applications, including use as tags. The available repertoire of chemical diversities 
in peptide libraries can be expanded by steps such as permethylation (Ostresh et al., Proc. Natl Acad Sci USA 
97:11138.1994). 

[01 1 8J Numerous variants of peptide combinatorial libraries are possible in which ttie peptide bacMsone is modified, 
and/or ttie amide bonds have been replaced by mimetic groups. Amide mimetic groups which may be used include 
ureas, urettianes. jand carbonylmethylene groups. Restructuring the fc>ackbone such tfiat sidechains emanate from ttie 
amide nito-ogens of each amino acid, rattier than the alpha-carbons, gives libraries of compounds known as peptoids 
(Simon eta!., Proc. Natl. Acad Sc/., USA 59:9367. 1992). 

[01 1 9] Another common type of oligomeric combinatorial library is the oligonucleotide combinatorial library, where 
the buikiing blocks are some form off naturally occurring or unnatural nucleotide or polysaccharide derivatives. Including 
where various organk: and inorganic groups may substitute for ttie phosphate linkage, and nifrogen or sulfur may sub- 
stitute for oxygen in an ettier linkage (Schneider et al., Biochem. 34:9599, 1995: Freier et al.. J. Med Chem. 35:344, 
1 995; Frank. J. Biotechnology 4 1 259, 1 995; Schneider et al., Published PCT WO 942052; Ecker et al Nucleic Acids 
fles. 27:1853. 1993). 

[0120] More recentiy, ttie combinatorial production off collections off non-oltgomeric. small molecule compounds has 
been described (DeWitt et al.. Proc. Natl. Acad Sa\ USA 90:690. 1993; Bunin et al.. Proc. Natl. Acad. Sca, USA 
97:4708, 1994). Structures suitable for elaboration into small-molecule libraries encompass a wide variety of organic 
molecules, for example heterocyclics, aromatics, alicydlcs. aliphatics. sterokts. antibiotics, enzyme inhibitors, ligands. 
hormones, drugs, alkaloids, opioids, terpenes. porphyrins, toxins, catalysts, as well as combinations thereof 

g. Specif k: Methods tor Combinaton'al Synthesis of Tags 

[0121] Two methods for the preparation and use of a diverse set of amine-containing MS tags are outiined below. 
In botti mettiods, solid phase synthesis is employed to enable simultaneous parallel synttiesis off a large number of 
tagged linkers, using the technk^ues off combinatorial chemisti^y In the ffirst mettiod. the eventual cleavage of the tag 
from the oligonucleotide results in liberation of a carboxyl amide. In ttie second mettiod. deavage of the tag produces 
a carkx)xylic add. The chemical components and linking elements used in these methods are abbreviated as follows: 



R a resin 

FMOC = fluorenylmethoxycart)onyi protecting group 

All s allyl protecting group 

CO2H = carlDOxylic acid group 

CONH2 « cartx>xylic amide group 

NH2 « amino group 

OH = hydroxyl group 
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CONH = amide linkage 

COO = ester linkage 

NH2 - Rink - CO2H = 4-[{a-amino)-2,4-dimethoxybenzyl]- phenoxybutyric acid (Rink linker) 

OH - 1 MeO - CO2H = (4-hydroxymethyOphenoxybutyric acid 

OH ' 2MeO - CO2H = (4-hydroxymethyl-3-methQxy)phenQxyacetic acid 

NH2-A-COOH - amino add with aliphatic or aromatic amine functionality in side chain 

Xl ....Xn-COOH = set of n diverse cartx)xylic acids with unique molecular weights 

ollgol ... oligo(n) . = set of n oligonucleotides 

HBTU =0-benzotriazol-1-yl-N,N,N'.N*-tetramethyluronium hexafluorophosphate 

The sequence of steps in Method 1 is as follows: 

OH-2MeO-CONH.R 

i FMOC - NH - Rink - CO^H; couple (e.g. , HBTU) 
FMOC - NH • Rink • COO - 2MeO • CONH - R 

i piperidine (remove FMOC) 
NH, > Rink - COO - 2MeO - CONH- R 
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i FMOC - NH - A . COOH; couple (e.g., HBTU) 
FMOC -NH - A - CONH - Rink • COO - 2MeO - CONH - R 

i piperidine (remove FMOC) 

NHj - A • CONH - Rink - COO - 2MeO - CONH - R 

i divide into n aliquots 
iii^i couple to n different acids XI .... Xn - COOH 

XI Xn - CONH - A - CONH - Rink - COO- 2McO - CONH - R 

iiiii Cleave tagged linkers from resin with 1% TFA 

XI Xn - CONH . A -CONH - Rink - CO^H 

i4r4^44r couple to n oligos (oligol oligo(n)) 

i^'g'^ via Ffp esters) 

XI Xn- CONH- A- CONH -Rink. CONH- oligol oligo(n) 

4f pool tagged oligos 

4^ perform sequencing reaction 

X separate different length fragments from 

sequencing reaction (e.g, , via HPLC or 
i cleave tags from linkers with 25%- 1 00% TFA 

XI Xn . CONH . A - CONH 

i 

analyze by mass spectrometry 

The sequence of steps in Method 2 is as follows: 

OH - IMeO-CO,- All 

i FMOC . NH - A . COjH; couple (e.g., HBTU) 
FMOC - NH - A . COO - IMeO - CO^ - All 
4- Palladium (remove Allyl) 
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FMOC • NH - A . COO - 1 MeO - CO,H 
^ 4 OH - 2McO - CONH - R; couple {e.g. , HBTU) 

FMOC -NH - A - COO - IMeO - COO ^ 2MeO - CONH - R 
I piperidine (remove FMOC) 

10 

NH2 - A - COO - IMeO - COO • 2MeO - CONH -R 

4 divide into n aliquots 

44^^114 couple to n different acids X 1 Xn - COjH 

XI Xn-CONH- A-COO - lMeO-COO-2MeO-CONH.R 

20 i^i-lir cleave tagged linkers from resin with 1% TP A 

XI Xn - CONH . A - COO - IMcO - CO^H 

4^i444r couple to n oligos (oligol . — oligo(n)) 

(e.;., via Pfy esters) 

XI Xn - CONH • A - COO • IMeO - CONH - oligol oligo(n) 
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I pool tagged oligos 

4 perform sequencing reaction 

4 separate different length fragments from 

sequencing reaction {e.g. , via HPLC or CE) 
i cleave tags from linkers with 2S- 1 00% TEA 



XI Xn - CONH - A - CO^H 

analyze by mass spectrometry 
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[0122] A "linker** component (or L), as used herein, means either a direct covalent bond or an organic chemical 
so group which is used to connect a "tag" (or T) to a "molecule of interest" (or MOI) through covalent chemical bonds. In 
addition, the direct bond itself, or one or more bonds within the linker component is cleavable under conditions which 
allows T to be released (in other words, cleaved) from the remainder of the T-L-X compound (including the MOI compo- 
nent). The tag variable component which is present within T should be stable to the cleavage conditions. Preferat>ly. the 
cleavage can be accomplished rapidly; within a few minutes and preferably within about 15 seconds or less. 
55 [0123] In general, a linker is used to connect each of a large set of tags to each of a similarly large set of MOIs. 
Typically, a single tag-linker combination is attached to each MOt (to give various T-L-MOf). but in some cases, more 
than one tag-linker combination may be attached to each individual MOI (to give various (T-L)n-MOI). In another errtood- 
iment of the present invention, two or more tags are bonded to a single linker through multiple, independent sites on the 
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iinker, and this multiple tag-iinker combination is then bonded to an individual MOI (to give various (T)n-L-MOI). 
[0124] After various manipulations of the set of tagged MOIs, special chemical and/or physical conditions are used 
to deave one or more covalent bonds in the linker, resulting in the liberation of the tags from the MOIs. The cleavable 
bond(s) may or may not be some of the same bonds that were formed when the tag. linker, and MOI were connected 

5 together The design of the linker will, in large part, determine the conditions under which cleavage may be accom- 
plished. Accordingly, linkers may be identified by the cleavage conditk}ns they are particularly susceptible too. When a 
linker is photdabile (/.e.. prone to cleavage tiy exposure to actinic radiation), the linker may be given the designation 
L^". Ukewise, the designations L«^*^. L*^^. J^l. L«"^. L«'^ and may be used to refer to linkers that are par- 
ticularly susceptible to cleavage by acid. base, chemical oxidation, chemical reduction, the catalytic activity of an 

10 enzyme (more simply "enzyme*), electrochemical oxidation or reduction, elevated temperature fthermal") and thiol 
exchange, respectively. 

[0125] Certain types of linker are labile to a single type of cleavage condition, whereas others are labile to several 
types of cleavage conditions. In addition, in linkers which are capable of bonding multiple tags (to give (T)n-L-M01 type 
structures), each of the tag-bonding sites may be labile to different cleavage conditions. For example, in a linker having 
15 two tags bonded to it. one of the tags may be labile only to base, and the other labile only to photolysis. 
[0126] A iinker which is useful in the present invention possesses several attrSxites: 

1) The linker possesses a chemical handle (L^) through which it can be attached to an MOI. 

2) The linker possesses a second, separate chemical handle (Lh) through which the tag is attached to the linker. If 
20 multiple tags are attached to a single linker ((T)n-L-MOI type structures), then a separate handle exists for each tag. 

3) The linker is stabljB toward all manipulations to which it is subjected, with the exception of the conditions which 
allow cleavage such that a T-containing moiety is released from the remainder of the compound, including the MOI. 
Thus, the linker is stable during attachment of the tag to the linker, attachment of the linker to the MOI. and any 
manipulatkms of the MOI while the tag and iinker (T-L) are attached to it. 

25 4) The linker does not significantly interfere with the manipulations perfcwmed on the MOI while the T-L is attached 
to it. I=br instance, if the T-L is attached to an oligonucleotide, the T-L must not significantly interfere with any hybrid- 
ization or enzymatic reactions (e.^^.. PGR) performed on the oligonudeotkJe. Similarly, if the T-L is attached to an 
antibody, it must not significantly interfere with antigen recognition by the antibody 

5) Cleavage of the tag from the remainder of the compound occurs in a highly controlled manner, using physical or 
30 chemical processes that do not adversely affect the detectability of the tag. 

[0127] For any given linker, it is preferred that the linker be attachable to a wide variety of MOIs, and that. a wide 
variety of tags be attachable to the linker. Such flexibility is advantageous because it allows a library of T-L conjugates, 
once prepared, to be used with several different sets of MOIs. 
35 [0128] As explained above, a preferred iinker has the fbrnuila 

Lh-U-L2-L3.|^, 

wherein each L^ is a reactive handle tat can be teed to link the linker to a tag reactant and a molecule of interest reac- 
40 tant. L^ is an essential part of the linker, because L^ imparts lability to the linker. L^ and L^ are optional groups which 
effectively serve to separate L^ from the handles L^. 

[0129] L^ (which, by definition, is nearer to T than is L^). serves to separate T from the required labile moiety L^. 
This separation may be useful when the cleavage reaction generates particularly reactive species (e.^.. free radicals) 
which may cause random changes in the structure of the T-containing moiety As the cleavage site is further separated 

45 from the T-containing moiety, there is a reduced likelihood that reactive species formed at the cleavage site will disrupt 
the structure of the T-containing moiety. Also, as the atoms in LI will typically be present in the T-containing moiety, 
these L^ atoms may impart a desirable quality to the T-contatning moiety. For example, where the T-contairdng moiety 
is a T"*-containing moiety, and a hindered amine is desirably present as part of the structure of the T"®-containing moi- 
ety (to serve, e.g., as a MSSE). the hindered amine may be present in L^ labile moiety. 

so [0130] In other instances. L^ and/or L^ may be present in a linker component merely because the commercial sup- 
plier of a linker chooses to sei the linker in a form having such a L^ and/or L^ group. In such an instance, there is no 
harm in using linkers having L^ and/or L^ groups, (so long as these group do not inhibit the cleavage reaction) even 
though they may not contribute any particular performance advantage to the compounds that incorporate them. Thus, 
the present invention allows for L^ and/or L^ groups to be present in the linker conponent. 

55 [0131] L^ and/or L^ groups may be a direct bond (in which case the group is effectively not present), a hydrocarb- 
ylene group (e.p., alkylene, arylene, cydoalkylene. etc.), O-hydrocarbylene (e.p.. -O-CHg-. 0-CH2CH{CH3)-. etc.) or 
hydrocarbylene-{0-hydrocarbylene)^- wherein w is an integer ranging from 1 to about 10 (e.g,, -CHg-O-Ar-. -CH2-(0- 
CH2CH2)4-. etc.). 
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[0132] With the advent of soiid phase synthesis, a great bcxJy of literature has developed regarding linkers tat are 
labile to specific reaction conditions, tn typical solid phase synthesis, a solid support is bonded through a labile linker to 
a reactive site, arvj a molecule to be synthesized is generated at the reactive site. When the molecule has been com- 
pletely synthesized, the solid support-linker-molecule construct is subjected to cleavage conditions which releases the 

5 molecule from the solid support. The labile linkers which have been develc^ed for use in this context (or which may be 
used in this context) may also be readily used as the linker reactant in the present invention. 
[0133] Uoyd-Williams. P.. et aL. "Convergent Solid-Phase Peptide Synthesis", Tetrahedron Report No. 347, 
49(48):1 1065-11 133 (1993) provides an extensive discussion of linkers which are labile to actinic radiation (/.e., pho- 
tolysis), as well as acid, base and other cleavage conditions. Additional sources of information about labile linkers are 

10 well known in the art 

[01 34] As described above, different linker designs will confer cleavatsility ("lability**) under different specific physical 
or chemical corKlitions. Examples of conditions which sen/e to cleave various designs of linker include acid, base, oxi- 
dation, reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

[0135] Examples of cleavable linkers that satisfy the general criter» for linkers listed above will be well known to 
IS those in the art and include those fourxJ in the catalog available from Pierce (Rockfbrd, IL). Examples include: 

ethylene glycobis(succinimidylsuccinate) (EGS). an amine reactive cross-linking reagent which is cleavable by 
hydroxylamine (1 M at 37® C for 3-6 hours); 

disucdnimkiyl tartarate (DST) and sulfb-DST. which are amine reactive cross-linking reagents, cleavable by 0.015 
20 M sodium perkxiate; 

bis[2-(succinimkiyloxycarbonyloxy)ethyl}sulfone (BSOCOES) and sulfo-BSOCOES. which are amine reactive 
cross-linking reagents, cleavable by base (pH 1 1 .6): 

1,4-di-[3*-(2'-pyridyldithio(propionamido))butane (DPDPB), a pyridyldithiol crosslinker which is cleavable by thk>l 
exchange or reduction; 

25 • N-[4-(p-azidosalicylamido)-butyI]-3-(2'-pyridydithio)propionamide (APDP). a pyridykJithiol crosslinker which is 
cleavable by thiol exchange or reduction; 

bi5-[beta-4-(az(dosalicyiamido)ethyl]-disulfide. a photoreactive crosslinker which is cleavable by thiol exchange or 
reduction: 

N-succinimklyl-(4-azkjophenyl)-1 ,3*dithiopropiorude (SADP). a photoreactive crosslinker which 6 deavabie by thiol 
30 exchange or reduction; 

sutfbsucdnimidyl-2-(7-azido-4-methylcoumarin-3-acetamide)etiiyl-1,3'<lithiopropionate (SAED). a photoreactive 
crosslinker which is cleavable by thiol exchange or reduction; 

suHbsucdnimkJyl-2-(nrvazklo-o-nitroberizanwlo)-e1hyl-1,3*ditNc^ (SAND), a photoreactive crosslinker 

whk:h is cleavable by thk^l exchange or reduction. 

35 

[0136] Other examples of cleavable linkers and the cleavage conditions that can be used to release tags are as fol^ 
lows. A silyl linking group can be cleaved by fluoride or under acidic conditions. A 3-, 4-, 5-. or 6-substituted-2-nitroben- 
zyloxy or 2-, 3-, 5-, or 6-substituted-4-nitrobenzyloxy linking group can be cleaved tyy a photon source (photolysis). A 3- 
, 4-. 5-. or 6-substituted-2-alkoxyphenoxy or 2-, 3-, 5-. or 6-substituted-4-alkoxyphenoxy linking group can be cleaved 
40 by Ce(NH4)2(N03)6 (oxidation). A NCO2 (uretiiane) linker can be cleaved by hydroxide (base), acid, or UAIH4 (reduc- 
tion). A 3-pentenyl. 2-butenyl. or 1-txjtenyl linking group can be cleaved by O3. OSO4/IO4 , or KMn04 (oxidation). A 2- 
[3-, 4-. or 5-substituted-furyi]oxy linking group can be deaved by O2. Br2, MeOH. or acid. 

[0137] Conditions for the cleavage of other labile linking groups indude: t-alkyloxy linking groups can be deaved by 
acid; methyl(dialkyl)methoxy or 4-substituted-2-alkyl-1.3-dioxlane-2-yl linking groups can be cleaved by H3O*; 2- 

45 silylethoxy linking groups can be deaved by fluoride or acid; 2-(X)-ethoxy (where X = keto. ester amide, cyano. NO2. 
sulfkje. sulfoxide, suKone) linking groups can be deaved under alkaline conditions; 2-. 3-. 4-, 5-, or 6-sub$tituted-ben- 
zyloxy linking groups can be deaved by acid or under reductive conditions; 2-butenyloxy linking groups can be deaved 
by (Ph3P)3RhCl(H). 3-, 4-, 5-, or 6-substituted-2-bromophenoxy linking groups can be deaved by U. Mg. or BuLi; meth^ 
yithiomethoxy linking groups can be deaved by Hg^*; 2-(X)-ethyloxy (where X = a halogen) linking groups can be 

so cleaved by Zn or Mg; 2-hydroxyethyloxy linking groups can be cleaved by oxidation (e.p.. with Pb(OAc)4). 

[0138] Preferred linkers are those that are deaved by acid or photolysis. Several off the add-labile linkers that have 
been developed for solid phase peptide synthesis are useful for linking tags to MOls. Some of tiiese linkers are 
desaibed in a recent review by Uoyd-Wiiliams et al. {Tetrahedron 49:11065-11133, 1993). One useful type of linker is 
based upon p-alkoxybenzyl alcohols, of which two, 4-hydroxymethylphenoxyacetic acid arxl 4-(4-hydroxymethyl-3- 

55 methoxyphenoxy)butyric acid, are commercially available from Advanced ChemTech (Louisville. KY). Both linkers can 
be attached to a tag via an ester linkage to the benzylalcohol, and to an amine-containing MOI via an amkle linkage to 
the cartx>xylic acid. Tags linked by these molecules are released from the MOI with varying concentrations of trifluoro- 
acetic acid. The cleavage of these linkers results in the liberation of a carboxylic add on the tag. Acid deavage of tags 
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attached through related linkers, such as 2,4-dimethoxy*4'-(carboxymethyloxy)*benzhydrylamine (available from 
Advanced ChemTech in FMOC-protected form), results in liberation of a carboxylic amide on the released tag. 
[0139] The photolabtie linkers useful for this application have also been for the most part developed for solid phase 
peptide synthesis (see Uoyd-Williams review). These linkers are usually based on 2-nitrobenzylesters or 2-nitrobenzy- 

5 lamides. Two examples of photoiabile linkers that have recently been reported in the literature are 4-(4'(1-Fmoc- 
amino)ethyl)-2-methoxy-5-nitrophenoxy)butanoicacid (Holmes and Jones. J. Org, Chem. 60:2318-2319, 1995) and 3- 
(Fmoc-amino)-3-(2mltrophenyl)propionk: acid (Brown et al. Molecular Diversity 7:4-12. 1995). Both linkers can be 
attached via the cartx)xylic acid to an amine on the MOI. The attachment of the tag to the linker is made by forming an 
amide between a caiboxylic add on the tag and the amine on the linker. Cleavage of photoiabile linkers is usually per- 

10 formed with UV light of 350 nm wavelength at intensities and times known to those in the art. Cleavage of the linkers 
results in lifc>eratton of a primary amide on the tag. Examples of photocleavable linkers include nitrophenyl glycine 
esters, exo- and endo-2-benzonort3orneyi chlorides and methane sulfonates, and 3'amino-3(2-nitrophenyl) prc^ionic 
acid. Examples of enzymatic cleavage include esterases which will cleave ester bonds, nucleases which will cleave 
phosphodiester bonds, proteases which cleave peptide bonds, etc. 

15 [0140] A preferred linker comfx>nent has an ortho-nitrobenzyl structure as shown betow: 



20 




25 

wherein one carbon atom at positions a. b, c. d or e is sut>stituted with -L^-X. and 0 (which is preferably a direct bond) 
is present to the left of N(R^) in the above structure. Such a linke- component is susceptible to selective photo-induced 

30 cleavage of the bond between the carbon labeled "a" and N(R^). The identity of is not typically critical to the cleavage 
reaction, however R'' is preferably selected from hydrogen and hydrocarbyl. The present invention provides that in the 
atx>ve structure, -N(R^)- coukj be replaced with -O. Also in the above structure, one or more of positions b. c, d or e 
may optionally be substituted with aikyl. alkoxy. fluoride, chlorkie, hydroxyl. cartxncylate or amide, where these substit- 
uents are independently selected at each occurrence. 

35 [0141] A further preferred linker component with a chemical handle has the following structure: 



40 



45 



55 




wherein one or more of positions b, c. d or e is substituted with hydrogen, aikyl, alkoxy, fluoride, chloride, hydroxyl, car- 
so boxylate or amide, R^ is hydrogen or hydrocarbyl, and R^ is -OH or a group that either protects or activates a carboxylic 
acid for coupling with another moiety. Fluorocarbon and hydrofluorocarbon groups we preferred groups that activate a 
caitjoxylic acid toward coupling with another moiety. 



3. Molecule Of Interest fMQI) 

[0142] Examples of MOIs include nucleic acids or nudeic acid analogues (e.p.. PNA), fragments of nudeic acids 
(i.e., nucleic acid fragments), synthetic nudeic adds or fragments, oligonudeotides (e.g., DNA or RNA), proteins, pep- 
tides, antibodies or antibody fragments, receptors, receptor tigands. members of a ligand pair, cytokines, hormones, oli- 
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gosaccharides. synthetic organic molecules, drugs, ard combinations thereof. 

[0143] Preferred MOIs include nucleic acid fragments. Preferred nucleic acid fragments are primer sequences that 
are complementary to sequences present in vectors, where the vectors are used for base sequencing. Preferably a 
nucleic acid fragment is attached directly or indirectly to a tag at other than the 3' end of the fragment; and most prefer- 
5 ably at the 5' end of the fragment. Nucleic acid fragments may t>e purchased or prepared based upon genetic data- 
bases (e.g., Dib et al.. Nature 350:152-1 54, 1996 and CEPH Genotype Database. http7/www.cephb.fr) and commercial 
vendors (e.g.. Promega, Madison, Wl). 

[0144] As used herein, MOt includes derivatives of an MOI that contain functionality useful in joining the MOI to a 
T-L-Lh compound. For example, a nucleic acid fragment that has a phosphodiester at the 5* end, where the phosphodi- 

10 ester is also bonded to an alkyleneamine. is an MOI. Such an MOI is descrflDed in, e.g., U.S. Patent 4.762.779 which is 
incorporated herein by reference. A nucleic acid fragment with an internal modification is also an MOL An exemplary 
internal modification of a nucleic acid fragment is where the base (e.g.. adenine, guanine, cytosine, thymidine, uracil} 
has been modified to add a reactive functional group. Such internally modified nucleic add fragments are commercially 
available from, e.g., Glen Research, Herndon. VA. Another exempleu-y internal modification of a rucleic acid fragment 

15 is where an abasic phosphoramidate is used to synthesize a modified phosphodiester which is interposed between a 
sugar and phosphate group of a nucleic acid fragment The abasic phosphoramidate contains a reactive group which 
allows a nucleic acid fragment that contains this phosphoramidate-derived moiety to t>e joined to another moiety, e.g» a 
T-L-Lfi compound. Such abasic phosphoramidates are commercially available from, e.g.. Clonetech Lat>oratories. Inc., 
Palo Alto, C A. 

20 

4. Ch^micql Handles (t^ 

[0145] A chemical handle is a statsle yet reactive atomic arrangement present as part of a first molecule, where the 
handle can undergo chemical reaction with a complementary chemical handle present as part of a second molecule. 
25 so as to form a covalent tx)nd between the two molecules. For example, the chemical handle may be a hydroxyl group, 
and the complementary chemical handle may be a carboxylic acid group (or an activated derivative thereof. e,g,, a 
hydrofluroaryl ester), whereupon reaction between these two handles forms a covalent bond (specifically, an ester 
group) that joins the two molecules together. 

[0146] Chemical handles may be used in a large number of covalent bond-forming reactions that are suitable for 
30 attaching tags to linkers, and linkers to MOIs. Such reactions include alkylation {e.g., to form ethers, thioethers). acyla- 
tion {e.g., to form esters, amides, carbamates, ureas, thioureas), phosphorylation {e.g., to form phosphates, phospho- 
nates, phosphoramides, phosphonamides), sulfonylation (e.^., to form sulfonates, sulfonamides), corxiensation (e.^., 
to form imines. oximes. hydrazones). silylation, disulfide formation, and generation of reactive intermediates, such as 
nitrenes or carbenes, t>y photolysis. In general, handles and tx)nd<forming reactions which are suitable for attaching 
35 tags to linkers are also suitable for attaching linkers to MOIs. and vice-versa. In some cases, the MOI may undergo prior 
modification or derivitization to provide the handle needed for attaching the linker. 

[0147] One type of t^nd especially useful for attaching linkers to MOIs is the disulfide bond. Its formation requires 
the presence of a thiol group (''handle") on the linker, and anotiier thiol group on the MOI. Mild oxidizing conditions then 
suffice to bonc\ the two thiols together as a disulfide. Disulfide formation can also be induced by using an excess of an 

40 appropriate disulfide exchange reagent, e.^.. pyridyl disulfides. Because disulfide formation is readily reversible, the 
disulfide may also be used as the cleavable t>ond for llt>erating the tag. if desired. This is typically accomplished urKler 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g., di1hk>threitol. 
[01 48] Of particular interest for linking tags (or tags with linkers) to oligonucleotides is the formation of amide bonds. 
Primary aliphatic amine handles can be readily introduced onto synthetic oligorajcleotides with phosphoramidHes such 

45 as 6-monomethoxytritylhexylcyanoethyl-N,N-diisopropyl phosphoramidite (available from Glenn Research, Sterling, 
VA). The amines found on natural nucleotides such as adenosine and guanosine are virtually unreactive when com- 
pared to the introduced primary amine. This difference in reactivity forms the basis of the ability to selectively form 
amides and related bonding groups (e.g. . ureas, thioureas, sulfonamides) with the introduced primary amine, arxl not 
the nucleotide amines. 

so [0149] As listed in the Molecular Probes catalog (Eugene. OR), a partial enumeration of amine-reactive functional 
groups includes activated carboxylic esters, isocyanates. isothiocyanates. sutfonyl halides. and dichlorotriazenes. 
Active esters arc excellent reagents for amine modification since the amide products formed are very stable. Also, these 
reagents have good reactivity with aliphatic amines and low reactivity with the nucleotide amines of oligonucleotides. 
Examples of active esters include N-hydroxysuccinimide esters, pentafluorophenyl esters, tetrafluorophenyl esters, and 

55 p-nitrophenyl esters. Active esters are useful because they can be made from virtually any molecule that contains a car* 
lx>xylic acid. Methods to make active esters are listed in Bodansky (Principles of Peptide Chemistry (2d ed.). Springer 
Verlag. London, 1993). 
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5. Linker Attachment 

[01 50] Typically, a single type of linker is used to connect a particular set or family of tags to a particular set or family 
of MOIs. In a preferred embodiment of the invention, a single, uniform procedure may t>e followed to create ail the var- 
ious T-L-MOl structures. This is especially advantageous when the set of T-L-MOl structures is large, because it allows 
the set to be prepared using the methods of combinatorial chemistry or other parallel processing technology. In a similar 
manner, the use of a single type of linker allows a single, uniform procedure to be employed for cleaving all the varioi« 
T-L-MOl structures. Again, this is advantageous for a large set off T-L-MOl structures, because the set may be proc- 
essed in a parallel, repetitive, and/or automated manner. 

(0151 J There are. however, other embodiment of the present invention, wherein two or more types of linker are used 
to connect different subsets of tags to con-esponding subsets of MOIs. In this case, selective cleavage conditions may 
be used to cleave each of the linkers independently, without cleaving the linkers present on other subsets of MOIs. 
[01521 A large number of covalent bond-forming reactions are suitable for attaching tags to linkers, and linkers to 
MOIs. Such reactions include alkylation {e.g.. to form ethers, thioethers). acylation (e.g.. to form esters, amides, car- 
bamates, ureas, thtoureas). phosphorylation {e.g., to form phosphates, phosphonates. phosphoramides. phosphona- 
mides). suHonylation (e.g., to form sulfonates, sulfonamides), condensation (e.g. to form imines. oximes, hydrazones). 
silylation. disulfide formation, and generation of reactive intermediates, such as nitrenes or carbenes, by photolysis. In 
general, handles and bond-forming reactions which are suitable for attaching tags to linkers are also suitable for attach- 
ing linkers to MOIs. and vice-versa. In some cases, the MOI may undergo prior modification or derivitization to provide 
the handle needed for attacNng the linker. 

[01 531 One type of bond especially useful for attaching linkers to MOIs is the disulfide bond. Its formation requires 
the presence of a thiol group ("handle") on the linker, and another thiol group on the MOI. Mild oxidizing conditions then 
suffice to bond the two thiols together as a disulfide. Disutfkje formation can also be induced by using an excess of an 
appropriate disulfide exchange reagent, e.g.. pyridyl disulfides. Because disulfide formation is readily reversible, the 
disulfide may also be used as the cfeavable bond for liberating the tag. if desired. This is typically accomplished under 
similarly mild conditions, using an excess of an appropriate thiol exchange reagent, e.g.. dithiothreitd. 
[01541 Of particular interest for linking tags to oligonucleotides is the formation of amide bonds. Primary aliphatic 
amine handles can be readily introduced onto synthetic oligonucleotides with phosphoramidites such as 6-monometh- 
oxytritylhexylcyanoethyl-N.N-diisopropyl phosphoramidite (available from Glenn Research. Sterling. VA). The amines 
found on natural nucleotides such as adenosine and guanosine are virtually unreactive when corrpared to the intro- 
duced primary amine. This difference in reactivity forms the basis of the atwlity to selectively form amides and related 
bonding groups (e.g. ureas, thioureas, sulfonamides) with the introduced primary amine, and not the nucleotide 
amines. 

[0155] As listed in the Molecular Probes catalog (Eugene. OR), a partial enumeration of amine-reactive functtonal 
groups includes activated carboxylic esters, isocyanates. isothiocyanates. sulfonyl halides, and dichlorotriazenes. 
Active esters are excellent reagents for amine modification since the amide products formed are very stable. Also, these 
reagents have good reactivity with aliphatic amines and low reactivity with the nucleotide amines of oligonucleotides. 
Examples of active esters include N-hydroxysucdnimide esters, pentaf luorophenyl esters, tetrafluorophenyl esters, and 
p-nitrophenyl esters. Aaive esters are useful because they can be made from virtually any molecule that contains a car- 
boxylic acid. Methods to make active esters are listed in Bodansky {Principles of Peptide Chemistry (26 ed ) Sprinaer 
Verlag. London. 1993). 

[0156] Numerous commercial cross-linking reagents exist which can serve as linkers (eg., see Pierce Cross-link- 
ers. Pierce Chemical Co.. Rockfbrd. IL). Among these are homobifunctional amine-reactive cross-linking reagents 
which are exenplified by homobifunctional imidoesters and N-hydroxysuccinimidyl (NHS) esters. There also exist het- 
erobifunctional cross-linking reagents possess two or more different reactive groups that allows for sequential reac- 
tions. Imidoesters react rapidly with amines at alkaline pH. NHS-esters give stable products when reacted with primary 
or secondary amines. Maleimides. alkyl and aryl halides, alpha-haloacyls and pyridyl disulfides are thiol reactive. MaJe- 
imides are specifk: for ttiiol (sulfhydryl) groups in the pH range of 6.5 to 7.5. and at alkaline pH can become amine reac- 
tive. The tfiioether linkage is stable under physiological conditions. Alpha-haloacetyl cross-linking reagents contain ttie 
iodoacetyl group and are reactive towards suHhydryls. Imidazoles can react with the iodoacetyl moiety, but tfie reaction 
is very slow. Pyridyl disulfides react w\fh ttiiol groups to form a disulfide bond. Carbodiimides couple carboxyls to pri- 
mary amines of hydrazides which give rises to the formation of an acyl-hydrazine bond. The arylazides are photoafflnity 
reagents which are chemically inert until exposed to UV or visible light. When such compounds are photolyzed at 250- 
460 nm. a reactive aryl nitrene is formed. The reactive aryl nitrene is relatively non-specific. Glyoxals are reactive 
towards guanidinyl portion of arginine. 

[0157] In one typical embodiment of the present invention, a tag is first tx>nded to a linker, then the con*ination of 
tag and linker is tjonded to a MOI. to create the structure T-L-MOl. Alternatively, tiie same structure is formed by first 
bonding a linker to a MOI. and then bonding the combination of linker and MOI to a tag. An example is where ttie MOI 
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is a DNA primer or oligonucleotide. In that case, the tag is typically first bonded to a linker, then the T-L is bonded to a 
DNA primer or oligonucleotide, which is then used, for example, in a sequencing reaction. 

[0158] One useful form in which a tag could be reversibly attached to an MOI (e.^., an oligonucleotide or DNA 
sequencing primer) is through a chemically labile linker. One preferred design for the linker allows the linker to be 
5 cleaved when exposed to a volatile organic add, for example, trifluoroacetic ackJ (TFA). TFA in particular is compatible 
with most methods of MS ionization, including electrospray. 

[0159] As described in detail below, the invention provides a method for determining the sequence of a nucleic acid 
molecule. A composition wfiich may be formed by the inventive method comprises a plurality of compounds of the for- 
mula: 

10 

T™-L-MOI 

wherein is an organic group detectable hy mass spectrometry. contains cartx)n, at least one of hydrogen 
and fluoride, and may contain optional atoms Including oxygen, nitrogen, sulfur, phosphorus arvj iodine. In the formula. 

15 L is an organic group which allows a T^-containing moiety to be deaved from the remainder of the compound upon 
exposure of the compound to cleavage condition. The cleaved T^-containing moiety includes a functional group which 
supports a single ionized charge state when each of the plurality of compounds is subjected to mass spectrometry. The 
functional group may be a tertiary amine, quaternary amine or an orgcuiic ackJ. In the formula, MOI is a nucleic add 
fragment which is conjugated to L via the 5* end of the MOI. The term "conjugated' means that there may be chemical 

20 groups intermediate L and the MOI, e.g., a phosphodiester group and/or an alkylene group. The nucleic acid fragment 
may have a sequence complementary to a portton of a vector, wherein the fragment is capable of priming nucleotide 
synthesis. 

[0160] In the compositk>n, no two compounds have either the same T"^ or the same MOI. In other words, the com- 
position indudes a plurality of compounds, wherein each conpound has both a unique T"^ and a unc|ue nucleic add 

25 fragment (unique in that it has a unique base sequence). In addition, the composition may t^e described as having a 
plurality of compounds wherein each compound is defined as having a unk)ue T™. where the T^^ is unique in that no 
other compound has a T^^ that provides the same signal by mass spectrometry. The composition therefore contains a 
plurality of compounds, each having a with a unique mass. The composition may also be described as having a 
plurality of compounds wherein each compound is defined as having a unk^ue nucleic acid sequence. These nucleic 

30 acid sequences are intentionally unique so that each conpound will serve as a primer for only one vector, when the 
composition is combined with vectors for nucleic acid sequendng. The set of compounds having ufnque Tms groups is 
the sEime set of compounds which has unque ruicleic ackl sequences. 

[0161] Preferat)iy, the groups are unique in that there is at least a 2 amu, more preferably at least a 3 amu, arxi 
still more preferably at least a 4 amu mass separation between the T"^ groups of any two different compounds. In the 
35 composition, there are at least 2 different conpounds. preferat>ly there are more than 2 different compounds, and more 
preferably there are more than 4 different compourrds. The composition may contain 100 or more different compounds, 
each compound having a unique T^^ and a unk^e nucleic acki sequence. 

[0162] Another conposition that is useful in. e.g. , determining the sequence of a nudeic ackl molecule, indudes 
water and a compound of the formula T^®-L-MOI. wherein T*"® is an organic group detectable by mass spectrometry. 

40 T^^ contains carbon, at least one of hydrogen and f luorkJe. and may contain optional atoms including oxygen, nitrogen, 
sulfur, phosphorus and iodine. In the formula, L Is an organic group which allows a T^^-containing moiety to be deaved 
from the remainder of the compound upon exposure of the compound to deavage corxjition. The deaved T"^-cont£un- 
ing moiety includes a functional group which supports a single ionized charge state when each of the plurality of com- 
pounds is subjeded to mass spectrometry. The functional group may be a tertiary amine, quaternary amine or an 

45 organic add. In the formula, MOI is a nucleic acid fragment attached at its 5' end. 

[0163] In addition to water, this composition may contain a buffer, in order to maintain the pH of the aqueous com- 
position within the range of about 5 to about 9. Furthermore, the composition may contain an enzyme, salts (such as 
MgCl2. and NaCI) and one of dATP. dGTP, dCTR and dTTP. A preferred conposition contains water. T"®-L-MOI and 
one (and only one) of ddATP, ddGTP, ddCTP. and ddTTP. Such a composition is suitable for use in the dideoxy sequenc- 

50 ing method. 

[0164] The invention also provkies a composition which contains a plurality of sets of compounds, wherein each set 
of compounds has the formula: 

T^-L-MOI 

55 

wherein. 

[0165] V^^ is an organic group detectable by mass spectrometry, comprising cartson, at least one of hydrogen and 
fluoride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine. L is an organic group which 
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allows r"^-containing moiety to be deaved from the remainder of the compound, wherein the T^^-contairtng moiety 
comprises a functional group which supports a single tonized charge state when the coirpound Is subjected to mass 
spectrometry and is selected from tertiary amine, quaternary amine and organic acid. The MOI is a nucleic acid frag- 
ment wherein L is conjugated to MOI at the MOrs 5' end. 

[0166] Within a set all menders have the same group, and the MOI fragments have variable lengths that ter- 
minate with the same dideoxynucleotide selected from ddAMR ddGMP. ddCMP and ddTMP; and between sets, the T^^ 
groups differ by at least 2 amu. preferably by at least 3 amu. The plurality of sets is preferably at least 5 and may number 
100 or more. 

[0167] In a preferred composition comprising a first plurality of sets as described above, there is additionally 
present a second plurality of sets of compounds having the formula 

r"«-L-MOI 

wherein r"« is an organic group detectable by mass spectrometry, comprising carbon, at least one of hydrogen and flu- 
oride, and optional atoms selected from oxygen, nitrogen, sulfur, pho^orus and Iodine. L is an organic group which 
allows a r"^-containing moiety to be cleaved from the remainder of the compound, wherein the T^*-conlaining moiety 
comprises a functional group which supports a single ionized charge state when the compound is sut>jected to mass 
spectrometry and is selected from tertiary amine, quaternary amine and organic acid. MOI is a nucleic acid fragment 
wherein L is conjugated to MOI at the MOI's 5* end. All members within the second plurality have an MOI sequence 
which terminates with the same dideoxynucleotide selected from ddAMR ddGMP, ddCMP and ddTMP; with the proviso 
tat the dideoxynucleotide present in the compounds of the first plurality is not the same dideoxynucleotide present in 
the compounds of the second plurality. 

101681 The invention also provides a kit for DNA sequencing analysis. The Wt comprises a plurality of container 
sets, where each container set includes at least five containers. The first container contains a vector. The second, third 
fourth and fifth containers contain compounds of the formula: 

r"*-L-MOI 

wherein is an organic group detectable by mass spectrometry, conprising caibon. at least one of hydrogen and f lu- 
orkle. and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine. L is an organic group which 
allows a r"®-containing moiety tp be cleaved from the remainder of the compound, wherein the T™*-containing moiety 
comprises a functional group which supports a single ionized charge state when the compound is subjected to mass 
spectrometry and is selected from tertiary amine, quaternary amine and organic acid. MOI is a nucleic acid fragment 
wherein L is conjugated to MOI at the MOI's 5' end. The MOI for the second, third, fourth and fifth containers is identical 
and complementary to a portion of the vector within the set of containers, and the T^ groip within each container is 
different from the other T^® groups In the kit. 

[0169] Preferably, within the kit. the plurality is at least 3, i.e.. there are at least three sets of containers. More pref- 
erat>ly, there are at least 5 sets of containers. 

[01 701 As noted above, the present invention provides compositions and methods for determining the sequence of 
nucleic acid molecules. Briefly, such methods generally comprise the steps of (a) generating tagged nucleic ackj frag- 
ments which are conplementary to a selected nucleic ackl molecule (e.g.. tagged firagments) from afirsttemiinus to a 
second terminus of a nucleic acid molecule), wherein a tag is correlative with a particular or selected nucleotide, and 
may be detected by any of a variety of methods, (b) separating the tagged fragments by sequential length, (c) cleaving 
a tag from a tagged fragment, and (d) detecting the tags, and thereby determining the sequence of the nucleic add mol- 
ecule. Each of the aspects will be discussed in more detail below 

B. SEQUENCING METHODS AND STRATPfilF.Q 

[01 71 ] As noted above, the present invention provides methods for determining the sequence of a nucleic acid mol- 
ecule. Briefly, tagged nucleic acid fragments are prepared. The nucleic acid fragments are complementary to a selected 
target nucleic add molecule. In a prefen*ed en±>odiment. the nudeic acid fragments are produced from a first terminus 
to a second terminus of a nucleic acid molecule, and more preferably from a 5' terminus to a 3* tenninus In other pre- 
fen-ed embodiments, the tagged fragments are generated from 5'-tagged oligonudeotide primers or tagged dideoxynu- 
cleotide terminators. A tag of a tagged nucleic acid fragment is conrelative with a particular nudeotide and is detectable 
by spectrometry finduding fluorescence, but preferably other than fluorescence), or by potentiometry. In a preferred 
embodiment, at least five tagged nudeic acid fragments are generated and each tag is unique for a nudeic acid frag- 
ment. More specifically, the number of tagged fragments will generally range from about 5 to 2,000. The tagged nucleic 
acid fragments may be generated from a variety of conrpounds, including those set forth above. It will be evklent to one 
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in the art tat the methods of the present invention are not timited to use only of the representative compounds and com- 
positions described herein. 

[0172] Following generation of tagged nucleic acid fragments, the tagged fragments are separated by sequential 
length. Such separation may be performed by a variety of techniques. In a preferred err^Dodiment. separation is by liquid 

5 chromatography (LC) and particularly prefen^ed is HPLC. Next, the tag is cleaved from the tagged fragment. The partic- 
ular method for breaking a bond to release the tag is selected based upon the particular type of susceptibility of the 
bond to cleavage. I=6r example, a light-sensitive bond (i.e.. one that breaks by light) will be exposed to lighit. The 
released tag is detected by spectrometry or potentiometry Prefened detection means are mass spectrometry, infrared 
spectrometry, ultraviolet spectrometry and potentiostatic amperometry (e.g., with an anperometric detector or couie- 

10 metric detector). 

[0173] It will be appreciated by one in the art that one or more of the steps may be automated, e.g.. by use of an 
instrument. In addition, the separation, cleavage and detection steps may be performed in a continuous manner (e.g.. 
continuous flow/continuous fluid path of tagged fragments through separation to cleavage to tag detection). For exam- 
ple, the various steps may be incorporated into a system, such that the steps are performed in a continuous manner. 
IS Such a system is typically in an instrument or combination of instruments format. For example, tagged nucleic acid frag- 
ments that are separated (e.g.. by HPLC) may flow into a device for cleavage (e.g., a photo-reactor) and then into a tag 
detector (e.g. . a mass spectrometer or coulometric or amperometric detector). Preferably, the device for cleavage is tun- 
at^e so that an optimum wavelength for the cleavage reaction can be selected. 

[01 74] It will be apparent to one in the art that the methods of the present invention for nucleic acid sequencing may 
20 be performed tor a variety of purposes. For example, such use of the present mettiods include primary sequence deter- 
mination for viral, bacterial, prokaryotic and eukaryotic (e.g.. mammalian) nucleic acid molecules; mutation detection; 
diagnostics; torensics; identity; and polymorphism detection. 

1. Seouencina Methods 

25 

[0175] As noted above, compounds including, those of the present invention may be utilized for a variety of 
sequencing methods, including both enzymatic and chemical degradation methods. Briefly, the enzymatic method 
descrfoed by Sanger (Proc. Natl, Acad. Sd. (USA) 74:5463. 1977) which utilizes dideoxy-terminators. involves the syn- 
thesis of a DNA strand from a single-stranded template by a DNA polymerase. The Sanger method of sequencing 

30 depends on the fact that that dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way a 
normal deoxynucieotides (albeit at a lower efficiency). However. ddNTPs differ from normal deoxynucleotides (dNTPs) 
in that they lack the 3*-OH group necessary for chain elongation. When a ddNTP is incorporated into the DNA chain, 
the absence the 3'-hydroxy group prevents the tormation of a new phosphodiester bond and the DNA fragment is ter- 
minated with the ddNTP complementary to the base in the terrplate DNA. The Maxam and Gilbert method (Maxam and 

35 Gilbert Proc. Natl. Acad. Sci. (USA) 74:560. 1977) employs a chemical degradation method of the original DNA (in 
both cases the DNA must be clonal). Both methods produce populations of fragments that begin from a particular point 
and terminate in every base that is tound in the DNA fragment that is to t>e sequenced. The termination of each frag- 
ment is dependent on the location of a particular base within the original DNA fragment The DNA fragments are sep- 
arated by polyacrylamide gel electrophoresis and the order of the DNA bases (A,C.T.G) is read from a autoradiograph 

40 Of the gel. 

2. Exonuclease DNA Seauendna 

[0176] A procedure for determining DNA nucleotide sequences was reported by Labeit et al. (S, Labeit. H. Lebrach 
45 & R. S. Goody. DNA 5: 173-7. 1986; A new method of DNA sequencing using deoxynucleoside alpha-thiotriphos- 
phates). In the first step of the method, four DNAs. each separately substituted with a different deoxynucleoside phos- 
phorothioate in place of the corresponding monophosphate, are prepared by tenrplate-directed polymerization 
catalyzed by DNA polymerase, in the second step, these DNAs are subjected to stringent exonuclease lit treatment, 
which produces only fragments terminating with a phosphorothioate internucleotide linkage. These can then be sepa- 
so rated by standard gel electrophoresis techniques and the sequence can be read directiy as in presently used sequerK:- 
Ing metfiods. Porter et al. (K. W. Porter. J. Tomasz. F. Huang. A. Sood & B. R. Shaw, Biochemistry 34: 1 1963-1 1969. 
1995; N7-cyanoborane-2'-deoxyguanosine 5'-triphosphate is a good substrate for DNA polymerase) described a new 
set of boron-substituted nucleotide analogs which are also exonuclease resistant and good substrates for a number of 
polymerases: these base are also suitat>le for exonuclease DNA sequencing. 

55 

3. A Simpltfied Strateovfor Seouencina Laroe N umbers of Full Lenath cDNAs. 

[0177] cDNA sequencing has been suggested as an alternative to generating the complete human genomic 
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sequence. Two approaches have been attempted. The first involves generation of expressed sequence tags (ESTs) 
through a single DNA sequence pass at one end of each cDNA done. This method has ^en insights into the distritxj- 
tion of types of expressed sequences and has revealed occasional useful homology with genomic fragments, but overall 
has added little to our knowledge base since insufficient data from each cione is provided. The second approach is to 
generate complete cDNA sequence which can indicate the possible function of the cDNAs. Unfortunately most cDNAs 
are of a size range of 1-4 kilobases which hinders the automation of full-length sequence determination. Currently the 
most efficient method for large scale, high throughput sequence production is from sequencing from a vector/primer 
site, which typically yields less than 500 bases of sequence from each flank. The synthesis of new oligonucleotide prim- 
ers of length 15-18 bases for primer walking* can allow dosure of each sequence. An alternative strategy for full length 
cDNA sequendng is to generate modified templates that are suitable for sequencing with a universal primer, but provide 
overlapping coverage of the molecules. 

[0178] Shotgun sequendng methods can be applied to cDNA sequencing studies by preparing a separate library 
from each cDNA clone. These methods have not been used extensively for the analysis of the 1.5 - 4.0 Mlobase frag- 
ments, however, as they are very labor intensive during the initial cloning phase, instead they have generally been 
applied to projects where the target sequence is of the order of 1 5 to 40 kilobases. such as in lambda or cosmid inserts. 

4. Analoav of cDNA with Genomic Seauencint^ 

(01 79] Despite the typically different size of the individual clones to be analyzed in cDNA sequencing, there are sim- 
ilarities with the requirements for large scale genomic DNA sequencing. In addition to a low cost per base, and a high 
throughput, the ideal strategy for full length cDNA sequencing will have a high accuracy. The favored current method- 
ology for genomic DNA sequencing involves the preparation of shotgun sequencing libraries from cosmids, followed by 
random sequendng using ABt fluorescent DNA sequendng instruments, and dosure (finishing) by directed efforts. 
Overall there is agreement that the fluorescent shotgun approach is superior to current alternatives in ternrs of effi- 
ciency and accuracy. The initial shotgun library quality is a critical determinant of the ease and quality of sequence 
assemt5ly The high quality of the available shotgun library procedure has prompted a strategy for the production of mul- 
tiplex shotgun libraries containing mixtures of the smaller cDNA clones. Here the individual clones to be sequenced are 
mixed prior to library construction and then identified following random sequendng, at the stage of computer analysis. 
Junctions between individual dones are \abe\edi during library production either by PCR or by identification of vector 
arm sequence. 

[0180] Clones may be prepared both by microbial methods or tiy.PCR. When using PCR. three reactions from each 
clone are used in order to minimize the risk for errors. 

[0181] One pass sequencing is a new technique designed to speed the identification of important sequences within 
a new region of genomic DNA. Briefly, a high quality shotgun library is prepared and then the sequences sanpled to 
obtain 80 - 95% coverage. For a cosmid this woukj typically be about 200 samples. Essentially all genes are likely to 
have at least one exon detected in this sample using either sequence similarity (BLAST) or axon structure (GRAIL2) 
screening. 

[0182] "Skirnming" has been successfully applied to cosmids and Pis. One pass sequencing is potentially the fast- 
est and least expensive way to find genes in a positional cloning project. The outcome is virtually assured. Most inves- 
tigators are currently developing cosmid contigs for exon trapping and related techniques. Cosmids are completely 
suitable for sequence skimming. P1 and other BACs coukl be considerably cheaper since there is savings both in shot- 
gun library construction and minimizatk>n of overlaps. 

5. Shotgun SeQuencino 

[0183] Shotgun DNA sequencing starts with random fragmentation of the target DNA. Random sequendng is then 
used to generate the majority of the data. A directed phase then completes gaps, ensuring coverage of each strand in 
both directions. Shotgun sequendng offers the advantage of high accuracy at relatively low cost. The procedure is best 
suited to the analysis of relatively large fragments and is the method of choice in large scale genomic DNA sequencing. 
[0184] There are several factors that are important in making shotgun sequencing accurate and cost effective. A 
major consideration is the quality of the shotgun library that is generated, since any dones that do not have inserts, or 
have chimeric inserts, will result in subsequent inefficient sequencing. Another consideration is the careful balandng of 
the random and the directed phases of the sequencing, so that high accuracy is obtained with a minimal loss of effi- 
ciency through unnecessary sequendng. 

6. Seauencino Chemistry: Taoaed-Terminator Chemistry 

[0185] There are two types of fluorescent sequencing chemistries currently available: dye primer, where the primer 
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is f luorescentiy labeled, and dye terminator, where the dideoxy terminators are labeled. Each of these chemistries can 
be used with either Taq DMA polymerase or sequenase enzymes. Sequenase enzyme seems to read easily through G- 
C rich regions, palindromes, simple repeats and other difficult to read sequences. Sequenase is also good for sequenc- 
ing mixed populations. Sequenase sequencing requires 5 ^g of template, one extension and a multi-step cleanup proc- 

5 ess. Taggediximer sequencing requires four separate reactions, one for each of A. C, G and T and then a laborious 
cleanup protocol. Taq terminator cycle sequencing chemistry is the most robust sequencing method. With this method 
any sequencing primer can be used. The amount of template needed is relatively small and the whole reaction process 
from setup to cleanup is reasonably easy, compared to sequenase and dye primer chemistries. Only 1.5 ^g of DNA 
template and 4 pm of primer are needed. To this a ready reaction mix is added. This mix consists of buffer, enzyme, 

10 dNTPs arKi labeled dideoxynudeotides. This reaction can be done in one tube as each of the four dtdeoxies Is labeled 
with a different fluorescent dye. These labeled terminators are present in this mix in excess because they are difficult 
to incorporate during extension. With unclean DNA the incorporation of these high molecular weight dideoxtes can be 
inhikMted. The premix includes dITP to minimize band compression. The use of Taq as the DNA polymerase allows the 
reactions to be run at high temperatures to minimize secondary structure problems as well as non-specific primer bind- 

IS ing. The wfiole cocktail goes through 25 cycles of denaturation, annealing and extension in a thermal cyder and the 
completed reaction is spun through a Sephadex G50 (Pharmacia, Ptscataway, NJ) column and is ready for gel loading 
alter five minutes in a vacuum dessicator. 

7. Designino Primers 

20 

[0186] When designing primers, the same criteria should be used as for designing PCR primers. In particular, prim- 
ers should preferably be 18 to 20 nucleotides long and the 3-prime end base should be a G or a C. Primers should also 
preferably have a Tm of more than 50**C. Primers shorter than 18 nucleotides will work but are not recommended. The 
shorter the primer the greater the probability of it binding at more than one site on the template DNA, and the lower its 

25 Tm. The sequence should have 100% match with the template. Any mismatch, especially towards the 3-prime end will 
greatly diminish sequencing ability. However primers with 5-prime tails can be used as long as there is about 18 bases 
at 3-prime that bind. If one is designing a primer from a sequence chromatogram. an area with high confidence must 
be used. As one moves out past 350 to 400 bases on a standard chromatogram. the peaks get broader arKi tiie base 
calls are not as accurate. As described herein, the primer may possess a 5* handle through wNch a lird<er or linker tag 

30 may be attached. 

8. Nudeic Acid Template Preparation 

[0187] The most important factor in tagged-primer DNA sequencing is the quality of the template. Briefly, one com- 

35 mon misconception is that if a template works in manual sequencing, it should work in automated sequencing, in fact, 
if a reaction works in manual sequendng it may work in automated sequendng, however, automated sequencing is 
much more sensitive and a poor quality template may result in Dttie or no data when fluorescent sequencing methods 
are utilized. High salt concentrations and other cell material not properly extracted during template preparation, includ- 
ing RNA. may likewise prevent the ability to obtain accurate sequence information. Many mini and max! prep protocols 

40 produce DNA which Is good enough for manual sequendng or PCR. but not for automated (tagged-primer) sequencing. 
Also the use of phend is not at all recommended as phenol can Intercalate in the helix structure. The use of 100% chlo- 
roform is sufficient. There are a number of DNA preparation methods which eu-e particulariy preferred for the tagged 
primer sequerx;ing methods provided herein. In particular, maxi preps which utilize cesium chloride preparatiOTS or Qia- 
gen (Chatsworth. CA) maxi prep, cdumns (being careful not to overload) are preferred. For mini preps, coturms such 

45 as Promega's Magic Mini prep (Madison. Wl). may be utilized. When sequencing DNA fragments such as PCR frag- 
ments or restriction cut fragments, it is generally preferred to cut the desired fragment from a low melt argarose gel and 
then purify with a product such as GeneClean (La Jolla, CA). It is very important to make sure that only one band is cut 
from the gel. For PCR fragments the PCR primers or internal primers can be used in order to ensure that the appropri- 
ate fragment was sequenced. To get optimum performance from the sequence analysis software, fragments should be 

so larger than 200 bases. Double stranded or single stranded DNA can be sequenced by this method. 

[0188] An additional factor generally taken into account when preparing DNA for sequencing is tiie choice of host 
strain. Companies selling equipment and reagents for sequencing, such as ABl (Foster City. CA) and Oiagen (Chats- 
worth. CA). typically recommend preferred host strains, and have previously recommended strains such as DH5 alpha. 
HB101. XL-1 Blue. JM109, MV1 190. Even when tiie DNA preparations are very clean, ttiere are otiier inherent Actors 

55 which can make it difficult to obtain sequence. G-C rich templates are always difliciit to sequence through, and second- 
ary structure can also cause prot)lems. Sequendng through a long repeats often proves to be difficult. For instance as 
Taq moves along a poly T stretch, the enzyme often falls off the template and jumps back on again, skipping a T. This 
results in extension produds with X amount of Ts in the poly T stretch and fragments with X-1 , X-2 etc. amounts of Ts 
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in the pdy T stretch. The net effect is that more than one base appears In each position making the sequence irrpos- 
sible to read. 

9. Use of Molecuiarly Distinct Cloning Vectnrs 

[0189] Sequencing may also be accomplished utilizing universal cloning vector (M13) and complementary 
sequencing primers. Briefly, for present cloning vectors the same primer sequence is used and only 4 tags are 
employed (each tag is a different fluorophore which represents a different terminator (ddtsTTP)). ev«-y anpliflcation 
process must take place in different containers (one DNA sample per container). That is. it is impossible to mix two or 
different DNA samples in the same amplification process. With only 4 tags available, only one DNA sarrple can be run 
per gel lane, "mere is no convenient means to deconvolute the sequence of more than one DNA sample with only 4 
tags. (In this regard, workers in the field take great care not to mix or contaminate different DNA samples when using 
current technologies.) 

[0190] A substantial advantage is gained when multiples of 4 tags can be run per gel lane or respective separation 
process. In particular, utitizing tags of the present invention, more than one DNA sanple in a single airplification reac- 
tion or container can be processed. When multiples of 4 tags are available for use. each tag set can be assigned to a 
particular DNA sample that is to be anplified. (A tag set is composed of a series of 4 different tags each with a unique 
property. Each tag is assigned to represent a different dideoxy-temiinator, ddATP, ddGTP, ddCTP. or ddTTR To employ 
tfiis advantage a series of vectors must be generated in which a unique priming site is inserted. A unique priming site 
is simply a stretch of 18 nucleotides which differs from vector to vector. The remaining nucleotide sequence is con- 
served from vector to vector. A sequencing primer is prepared (synthesized) which corresponds to each unique vector. 
Each unique primer is derived (or labelled) with a unique tag set 

[0191] With these respective molecular biology tools in hand, it is possible in the present invention to process mul- 
tiple samples in a single container. First. DNA samples which are to be sequenced are cloned into the multiplicity of vec- 
tors. For example, if 100 unique vectors are available. 100 ligation reactions, plating steps, and picking of plaques are 
performed. Second, one sample from each vector type is pooled making a pool of 100 unkiue vectors containing 100 
unique DNA fragments or samples. A given DNA sample is therefore identified and automatically assigned a primer set 
with the associated tag set. The respective primers, buffers, polymerase(s). ddNTPs. dNTPs and co-fcictors are added 
to the reaction container and the amplification process is carried out. The reaction is then subjected to a separation step 
and the respective sequence is established from the temporal appearance of tags. The ability to pool multiple DNA 
samples has sul)stantial advantages. The reagent cost of a typical PGR reaction is about $2.00 per sample. With the 
method described herein the cost of amplification on a per sample basis coukj be reduced at least by a factor of 100. 
Sample handling coukJ be reduced by a factor of at least 100, and materials costs could be reduced. The need tor large 
scale amplification robots would be otsviated. 

10. Seauencino Vectors for Cleavable Mass Soectroscopv Tagging 

[0192] Using cleavable mass spectroscopy tagging (CMST) of the present invention, each individual sequencing 
reaction can be read indeperKiently and simultaneously as the separation proceeds. In CMST sequencing, a different 
primer is used for each cloning vector: each reaction has 20 different primers when 20 clones are used per pool. Each 
primer con-esponds to one of the vectors, and each primer is tagged with a unique CMST molecule. Four reactions are 
performed on each pooled DNA sample (one for each base), so every vector has four oligonucleotide primers, each one 
identical in sequence Ixil tagged with a different CMS tag. The four separate sequencing reactions are pooled and run 
together. When 20 samples are pooled. 80 tags are used (4 bases per sample times 20 samples), and all 80 are 
detected simultaneously as the gel is run. 

[01 93] The construction of the vectors may be accomplished by cloning a random 20-nner on either side of a restric- 
tion site. The resulting dones are sequenced and a number chosen for use as vectors. Two oligonucleotides are pre- 
pared for each vector chosen, one homologous to the sequence at each side of the restriction site, and each orientated 
so that the 3'-end is towards the restriction site. Four tagged preparations of each primer are prepared, one tor each 
base in the sequencing reactions and each one labeled with a unique CMS tag. 

1 1 Advantages of Seouenctno bv the Use of RevarRihlft Ta^g 

[0194] There are sut>stanlial advantages when cleavable tags are used in sequencing and related technologies. 
First an increase in sensitivity will contribute to longer read lengths, as will the ability to collect tags for a specified 
period of time prior to measurement. The use of cleavable tags permits the development of a system that equalizes 
bandwidth over the entire range of the gel (1-1500 nucleotides (nt). for example). This will g^eatly impact the ability to 
obtain read lengths greater than 450 nt 
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[0195] The use of deavable multiple tags (MW identifiers) also has the advantage that multiple DNA samples can 
be run on a single gel lane or separation process. For example. It is possible using the methodologies disclosed herein 
to combine at least 96 sarrples and 4 sequencing reactions (A,G,T.C) on a single lane or fragment sizing process. If 
multiple vectors are employed which possess unique priming sites, ten at least 384 samples can be combined per gel 
5 lane (the different ternrtinator reactions cannot be anplified together with this scheme). When the ability to employ 
deavable tags is combined with the ability to use multiple vectors, an apparent 10.000-fbld increase in DNA sequencing 
thoughput is achieved. Also, in the schemes described herein, reagent use is decreased, disposables deaease. with a 
resultant decrease in operating costs to the consumer. 

[0196] An additional advantage is gained from the ability to process internal controls throughout tiie entire method- 
ic ologies described here. For any set of samples, an internal control nudeic add can be placed in the 8ample(s). This is 
not possible with the current configurations. This advantage permits the contrd of the amplification process, the sepa- 
ration process, the tag detection system and sequence assembly. This is an immense advantage over current systems 
in which the dontrols are always separated from tiie samples in ail steps. 

[0197] The compositions and metiiods described herein also have the advantage that ttiey are modular in nature 
15 and can be fitted on any type of separation process or method and in addition, can be fitted onto any type of detection 
system as improvements are made in either types of respective technologies. For example, the methodologies 
desaibed herein can be coMpled with "bundled" CE arrays or microfabricated devices that enable separation of DNA 
fragments. 

20 C. SEPARATION OF DNA FRAGMENTS 

[0198] A sample that requires analysis is often a mixture of many components in a complex matrix For samples 
containing unknown compounds, the conrponents must be separated from each other so that each individual compo- 
nent can be identified by other analytical metiiods. The separation properties of the conrponents in a mixture are con- 
25 stant under constant conditions, and therefore once determined they can be used to identify and quantify each of the 
components. Such procedures are typical in chromatographic and electrophoretic analytical separations. 

1. Htph-Performance Uouid Chromatoaraphv fHPLC) 

30 [0199] High-Performance liquid chromatography (HPLC) is a chromatographic separations technique to separate 
compounds tiiat are dissdved in sdution. HPLC instruments consist of a reservoir of mobile phase, a pump, an injector, 
a separation column, and a detector. Compounds are separated by injeding an aliquot of tiie sample mixture onto the 
column. The different components in the mixture pass through the cdumn at different rates due to differences in their 
partitioning t^havior between tiie mobile liquid phase and the stationary phase. 

35 [0200] Recentiy, IP-RO-HPLC on non-porous PS/DVB particles with chemically bonded alkyl chains have been 
shown to be rapid alternatives to capillary electrophoresis in the analysis of both single and double-stand nucleic acids 
providing simitair degrees of resolution (Huber et al. 1993. Anal.Biochem.. 212. p351; Huber et al„ 1993. Nuc. Adds 
Res., 21, pi 061; Huber et al.. 1993, Biotechnlques. 16, p898). In contrast to ion-excahnge chromoatrography, which 
does not always retain double-strand DNA as a function of strand lengtii (Since AT t>ase pairs imereacl with the posi- 

40 tively charged stationary phase, more strongly than GC base-pairs), IP-RP-HPLC enables a strictty size-dependent 
separation. 

[0201] A method has been developed using 100 mM triethylamnrx)nium acetate as lon-palring reagent, phosphodi- 
ester oligonudeotides could be successfully separated on silkylated non-fxjrous 2.3 |iM poly(styrene-divinyfbenzene) 
partides by means of high performance liquid chromatography (Oefner et al., 1994, Anal. Biochem., 223. p39). The 
45 technique described allowed tiie separation of PCR produds differing only 4 to 8 fc>ase pairs in length within a size range 
of 50 to 200 nudedtides. 

2. Electrophoresis 

so [0202] Electarophoresis is a separations technique that is based on the mobility of ions (or DNA as is the case 
described herein) in an electric field. Negatively charged DNA charged migrate towards a positive electrode and posi- 
tively-charged ions migrate toward a negative electo-ode. For safety reasons one eledrode is usually at ground and the 
other is biased positively or negatively. Charged species have different migration rates depending on tfieir total charge, 
size, and shape, and can therefore be separated. An electrode apparatus consists of a high-voltage power supply, elec- 

55 trodes. kxjffer. and a support for the buffer such as a polyacrylamide gel. or a capillary tube. Open capillary tubes are 
used for many types of samples and the other gel supports are usually used for biological samples such as protein mix- 
tures or DNA fragments. 
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3. Capillary Electrophoresis (CE) 

[0203] Capillary electrophoresis (CE) in its various manifestations (free solution, isotachophaesis, isoelectric 
focusing, polyacrylamide gel. micellar electrokinetic "chromatography") is developing as a method for rapid high reso- 
lution separations of very small sample volumes of complex mixtures. In combination with the Inherent sensitivity and 
selectivity of MS, CE-MS is a potential powerful technique for bioanalysis. In the novel application disclosed herein, the 
interfecing of these two methods will lead to superior DNA sequencing methods that eclipse the current rate methods 
of sequencing by several orders of magnitude. 

[0204] The correspondence between CE and etectrospray ionization (ESI) flow rates and the fact that both are ^cil- 
itated by (and primarily used for) ionic species in solution provide the basis for an extremely attractive combination. The 
conrtoination of both capillary zone electrophoresis (CZE) and capillary isotachophoresis with quadrapole mass spec- 
trometers based upon ESI have been described (Olivares et al., Anal, Chem. 59:1230. 1987; Smith et al. Anal, Chem. 
60:436. 1988: Loo etal., Anal. Chem, 179:404, 1989; Edmonds etal.. J. Chroma. 474:21. 1989; Loo etal.. J, Microcol- 
umn Sep. 1:223. 1989; Lee et al. J, Chromatog. 455:313, 1988; Smith et al.. J. Chromatog, 480.2'i'i, 1989; Grese et 
aL. J, Am, Chem, Soc, 1 1 7:2835. 1989). Small peptides are easily amenable to CZE analysis with good (femlomole) 
sensitivity. 

[0205] The most powerful separation method for DNA fragments is polyacrylamide gel electrophoresis (PAGE), 
generally in a slab gel format. However, the major limitation of the cun^ent technology is the relatively long time required 
to perform the gel electrophoresis of DNA fragments produced in the sequencing reactions. An increase magnitude ( 1 0- 
fold) can be achieved with the use of capiflait electrophoresis which utilize ultrathin gels. In free solution to a first approx- 
imation all DNA migrate with the same mobility as the addrtlon of a base results in the compensation of mass and 
charge. In polyacrylamide gels, DNA fragments sieve and migrate as a function of length and this approach has now 
been applied to CE. Remarkable plate number per meter has now been achieved with cross-linked polyacrylamide 
(10*^ plates per meter, Cohen etal.. Proc, Natl. Acad. ScL, USA 55:9660. 1988). Such CE columns as described can 
be employed for DNA sequencing. The method of CE is in principle 25 times faster than slab gel electrophoresis in a 
standard sequencer. For example, about 300 bases can be read per hour. The separation speed is limited in slab gel 
electrophoresis by the magnitude of the electric field which can be applied to the gel without excessive heat production. 
Therefore, the greater speed of CE is achieved through the use of higher field strengths (300 V/cm in CE versus 10 
V/cm in slab gel electrophoresis). The capillary format reduces the amperage and thus power and the resultant heat 
generation. 

[0206] Smith and others (Smith et al.. Nuc. Acids. Res. 75:4417. 1990) have suggested errploying multiple capil- 
laries in parallel to increase throughput. Ukewise, Mathies and Huang (Mathies and Huang. Nature 359:167. 1992) 
have introduced capillary electrophoresis in which separations are performed on a parallel array of capillaries and dem- 
onstrated high through-put sequencing (Huang et al.. Anal. Chem, 54:967. 1992, Huang et al.. Anal. Chem. 64.2149, 
1 992). The major disadvantage of capillary electrophoresis is the limited amount of sample that can be loaded onto the 
capillary By concentrating a large amount of sample at the beginning of the capillary, prior to separation, loadability is 
increased, and detection levels can be lowered several orders of magnitude. The most popular method of preooncen- 
tration in CE is sample stacking. Sample stacking has recently been reviewed (Chien and Burgi. Anal, Chem 64 :489 A, 
1992). Sanple stacking depends of the matrix difference. (pH. ionic strength) between the sample Ixjffer and the cap- 
illary buffer, so that the electric field across the sample zone is more than in the capillary region. In sanrple stacking, a 
large volume of sample in a low concentration buffer is introduced for preconcentration at the head of the capillary col- 
umn. The capillary is filled with a buffer of the same composition, but at higher concentration. When the sample ions 
reach the capillary buffer and the lower electric fiekJ, they stack into a concentrated zone. Sample stacking has 
increased detectabilities 1-3 orders of magnitude. 

[0207] Another method of preconcentration is to apply isotachophoresis (FTP) prior to the free zone CE separation 
of analytes. ITP is an electrophoretic technique which allows microliter volumes of sample to be loaded on to the capB- 
lary. In contrast to the low nL injection volumes typically associated with CE. The technique relies on inserting the sam- 
ple between two buffers (leading arxl trailing electrolytes) of higher and lower mobility respectively, than the analyte. 
The technique is inherently a concentration technique, where the analytes concentrate into pure zones migrating with 
the same speed. The technique is currently less popular than the stacking methods described above because of the 
need for several choices of leading and trailing electrolytes, and the ability to separate only cationic or anionic species 
during a separation process. 

[0208] The heart of the DNA sequencing process is the remarkably selective electrophoretic separation of DNA or 
oligonucleotide fragments. It is remarkable because each fragment is resolved and differs by only nucleotide. Separa- 
tions of up to 1000 fragments (1000 bp) have been obtained. A further advantage of sequencing with cleavable tags is 
as follows. There is no requirement to use a slab gel format when DNA fragments are separated by polyacrylamide gel 
electrophoresis when cleavable tags are employed. Since numerous sanples are corrtoined (4 to 2000) there is no 
need to run samples in parallel as is the case with current dye-primer or dye-terminator methods {i,e„ ABI373 
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sequencer). Since there is no reason to run parallel lanes, there is no reason to use a slab gel. Therefore, one can 
employ a tube gel format for the electrophoretic separation method. Grossman (Grossman et at.. Genet Anal. Tedi. 
AppL 9:9, 1992) have shown that considerat>le advantage is gained when a tube gel format is used in place of a slab 
gel format. This is due to the greater ability to dissipate Joule heat in a tube format compared to a slab gel which results 
in faster run times (by 50%), and much higher resolution of high molecular weight DNA fragments (greater than 1000 
nt). Long reads are critical in genomic sequencing. Therefore, the use of cleavable tags in sequencing has the addi- 
tional advantage of allowing the user to employ the most efficient and sensitive DNA separation method which also pos- 
sesses the highest resolution. 

4. Micrpfabrigatgd Dgyfees 

[0209] Capillary electrophoresis (CE) is a powerful method for DNA sequencing, forensic analysis. PGR product 
analysis and restriction fragment sizing. CE is far faster than traditional slab PAGE since with capillary gels a far higher 
potential field can be applied. However. CE has the drawtsack of allowing only one sample to be processed per gel. The 
method combines the faster separations times of CE with the ability to analyze muftipte sanples in parallel. The under- 
lying concept beNnd the use of microfabricated devices is the ability to increase the information density in electrophore- 
sis by miniaturizing the lane dimension to about 100 micrometers. The electronics industry routinely uses 
microfabrication to make circuits with features of less them one micron In size. The current density of capilleuy arrays is 
limited the outside diameter of the capillary tube. Micro^rication of channels produces a higher density of arrays. 
MicrofeU>rication also permits physical assemblies not possible with glass f bers and links the channels directly to other 
devices on a chip. Few devices have been constructed on microchips for separation technologies. A gas chromatograph 
(Terry et al.. IEEE Trans. Electron Device, ED-26:1880. 1979) and a liquid chromatograph (Manz et al.. Sens. Actuators 
B1 :249. 1 990) have been fabricated on silicon cfiips, but these devices have not been widely used. Several groups have 
reported separating fluorescent dyes and amino acids on microfabricated devices (Manz et al., J. Chromatography 
593:253. 1992, Effenhauser et al.. Anal. Chem. 65:2637, 1993). Recently Wbolley and Mathies (Wodley and Mathies, 
Proa Natl. Acad. Set. 97:11348. 1994) have shown that photolithography and chemical etching can be used to make 
large numbers of separation channels on glass substrates. The channels are filled with hydroxyethyl cellulose (HEC) 
separation matrices. It was shown that DNA restriction fragments could be separated in as little as two nunutes. 

D. CLEAVAGE OF TAGS 

[021 0] As described above, different linker designs will confer cleavability ("labiGty") under different specific physical 
or chemical conditions. Exanpfes of conditions which serve to cleave various designs of linker include acid, base, oxi- 
dation, reduction, fluoride, thiol exchange, photolysis, and enzymatic conditions. 

[0211] Examples of cleavable linkers that satisfy the general criteria for linkers listed abosre will be well known to 
those in the art and include those found in the catalog available from Pierce (Rockfofd, IL). Examples include: 

ethylene glycobis(succinimidylsuccinate) (EGS). an amine reactive aoss-linking reagent which is cleavable t>y 
hydroxylamine (1 M at ST^C for 3-6 hours); 

disuccinimidyl tartarate (DST) and sulfo-DST. which are arrtne reactive cross-linking reagents, cleavable by 0.015 
M sodium periodate; 

bis[2-(sucdnimk:lyloxycarbonytoxy)ethyl]sulfone (BSOCOES) and sulfo-BSOCOES, which are amine reactive 
cross-linking reagents, cleavable by base (pH 11 .6); 

1,4-dt-[3'-(2'-pyridyldithio(propionamido))butane (DPDPB), a pyridyldithiol crosslinker which is cleavable by thiol 
exchange or reduction; 

N-[4-(p-azidosalicylamido)-butylJ-3-(2*-pyridyditiiio)propionamide (APDP), a pyridyldithiol crosslinker whk:h is 
cleavable by thiol exchange or reduction; 

bts-Ibeta-4-(azidosalicylamido)ethyl]-disulfide, a photoreactive crosslinker which is cleavable by tNol exchange or 
reduction; 

N-succinimidyl-(4-azidophenyl)-1 ,3'dithtopropionate (SADP). a photoreactive crosslinker which is cleavable by thiol 
exchange or reduction; 

• sulfosucdniniidyI-2-(7-azido-4-methy!coumarin-3-acetamide)ethyl-1.3*-dithiopropionate (SAED), a photoreactive 
crosslinker which is cleavable by thiol exchange or reduction; 

sulfosuccinimidyl-2-(nvazido-o-nitroben2aznido)-ethyl-1,3*dithiopropionate (SAND), a phiotoreactive aosslinker 
which is cleavable by tiiiol exchange or reduction. 

[021 2] Other examples of cleavat^e linkers and the cleavage conditions that can be used to release tags are as fol- 
lows. A silyl linking group can be cleaved by fluoride or under acidic conditions. A 3-, 4-, 5-, or 6-substituted-2-nitroben- 
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zyloxy or 2-. 3-. 5-, or 6-substitLited-4-nitrobenzylQxy linking group can be cleaved by a photon source (photolysis). A 3- 
, 4-, 5-, or 6-substituted-2-alkDxyphenoxy or 2-. 3-. 5-. or 6-substituted-4-alkoxyphenoxy linking group can be cleaved 
by Ce(NH4)2(N03)6 (oxidation). A NCO2 (urethane) linker can be cleaved by hydroxide (base), acid, or IJAIH4 (reduc- 
tion). A 3-pentenyl, 2-butenyl. or 1-butenyl linking group can be cleaved by O3, OSO4/IO4 . or KMn04 (oxidation). A 2- 

5 [3-. 4-, or 5-substituted-furyl]oxy linking group can be cleaved by O2. Br2. MeOH, or acid. 

[021 3] Conditions for the deavage of other labile linking groups include: t-alkyloxy linking groups can be cleaved by 
acid; methyl(dialkyi)methoxy or 4-substituted-2-all^l-1,3-dioxlane-2-yl linking groups can be cleaved by H3O*; 2- 
silylethoxy linking groups can be cleaved by fluoride or acid; 2-(X)-ethoxy (where X = keto, ester amide, cyano. NOg. 
sulfkle, sulfoxide, sulfone) linking groups can be deaved under alkaline conditions; 2-. 3-, 4-. 5-. or 6-substituted-ben- 

10 zyloxy linking grouF>s can be cleaved by acid or under reductive conditions; 2-butenyloxy linking groups can be deaved 
by (Ph3P)3RhCI{H), 3-, 4-. 5-. or 6-substituted-2-bromophenoxy linking groups can be deaved by Li. Mg, or BuLi; meth- 
ylthiomethoxy linking groups can be deaved by Hg^*; 2-(X)-ethyloxy (where X = a halogen) linking groups can be 
cleaved by Zn or Mg; 2-hydroxyelhylQxy linking groups can be deaved by oxidation (e.g., with Pb(OAc)4). 
[0214] Preferred linkers are those that are cleaved by acid or photolysis. Several of the add-labile linkers that have 

IS been developed for solid phase peptide synthesis are useful for linking tags to MOIs. Some of these linkers are 
described in a recent review by Uoyd-Williams et al. {Tetrahedron 49:11065-1 1 133. 1993). One useful type of linker is 
based upon p-alkoxybenzyl alcohols, of which two. 4'-hydroxymethy|phenQxyacetic acid and 4-(4-hyclrGocymethyl-3- 
methoxyphenoxy)butyric acid, are commercially available from Advanced ChemTech (Louisville. K/). Both linkers can 
be attached to a tag via an ester linkage to the benzylalcohol, and to an amine-containing MOI via an amide linkage to 

20 the caitooxylic acid. Tags linked by these molecules are released from the MOI with varying concentrations of trifluoro- 
acetic acid. The cleavage of these linkers results in the liberation of a carboxylic add on the tag. Acid deavage off tags 
attached tfirough related linkers, such as 2.4<limethoxy-4'-(cart>oxymethyloxy)-benzhydrylamine (available from 
Advanced ChemTech in FMOC-protected form), results in lft>eration off a carboxylic amide on the released tag. 
[021 51 The photolabile linkers useful for this application have also been for the most part developed for solid phase 
25 peptide synthesis (see Uoyd-Williams review). These linkers are usually based on 2-nrtrobenzylesters or 2-nitroben2y- 
lamides. Two examples of photolabile linkers that have recently been reported in the literature are 4-(4-(1-Fmoc- 
amino)ethyl)-2-methQxy-5-nitrophenoxy)t)utanoicacid (Holmes and Jones. J. Org. Cham. 50:2318-2319, 1995) and 3- 
(Fmoc-amino)^-(2-nitrophenyl)proptonte acW (Brown et a!., Molecular Diversity r:4-12. 1995). Both linkers can be 
attached via the carboxylic add to an amine on the MOI. The attachment of tiie tag to tfie linker is made by forming an 
30 amide between a cariDoxylic add on the tag and the amine on the linker. Cleavage of photolabile linkers is usually per- 
formed with UV light of 350 nm wavelength at intensities and times known to those in ttie art Examples off commercial 
sources off instruments for photochemical cleavage are Aura Industries Inc. (Staten Island, NY) and Agrenetics (Wilm- 
ington. MA). Cleavage of tiie linkers results in liberation off a primary amide on the tag. Examples off photodeavable link- 
ers include nitrophenyl glydne esters, exo- and endO'2-benzonorborneyl chlorides and metinane sulfonates, and 3- 
35 amino-3(2-nitrophenyl) propionic add. Examples off enzymatic cleavage include esterases which will cleave ester 
bonds, nucleases wNch will cleave phosphodiester bonds, proteases which cleave peptide bonds, etc. 

E. DETE CTION OF TAQS 

40 [021 61 Detection methods typically rely on the absorption and emission in some type of spectral field. When atoms 
or molecules absorb light, the Incoming energy excites a quantized structure to a higher energy level. The type off exd- 
tation depends on the wavelength of the light. Electrons are promoted to higher orbhals by ultraviolet or visible light, 
molecular vibrations are excited by inffrared light, and rotations are excited by microwaves. An absorption spectrum is 
the absorprtion of light as a function off wavelength. The spectrum of an atom or molecule depends on its energy level 
45 structure. Absorption spectra are useful for klentification of compounds. Specific absorption spectroscopic methods 
include atomic absorption spectroscopy (AA), infrared spectroscopy (IR), and UV-vis spectroscopy (uv-vis). 
[0217] Atoms or molecules that are excited to high energy levels can decay to lower levels by emitting radiation. 
This light emission is called fluorescence if the transition is between states of tiie same spin, and phosphorescence if 
the transition occurs between states of different spin. The emission intensity of an analyte is linearly proportional to con- 
so centration (at low concentrations), and is useful for quantifying the emitting species. Specific emission spectroscopic 
methods include atomic emission spectroscopy (AES). atomic ffluorescence spectroscopy (AFS), molecular laser- 
irvjuced fluorescence (LIF). and X-ray fluorescence (XRF). 

[0218] When electromagnetic radiation passes through matter, most of the radiation continues in its original direc- 
tion but a small fraction is scattered in other directions. Light that is scattered at the same wavelength as the incoming 
55 light is called Rayleigh scattering. Light tiiat is scattered in transparent solids due to vibrations (phonons) is called 
Brillouin scattering. Brillouin scattering is typically shifted by 0.1 to 1 wave number from the incident light Ught that is 
scattered due to vibrations in molecules or optical phonons in opaque solids is called Raman scattering. Raman scat- 
tered light is shitted by as much as 4000 wavenumbers ffrom tiie incident light. Speciffic scattering specti-oscopic meth- 
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ods include Raman spectroscopy. 

[0219] IR spectroscopy is the measurement of the wavelength and intensity of the absorption of mid-infrared light 
by a sample. Mid-infrared light (2.5 - 50 jim, 4000 - 200 cm"^) is energetic enough to excite molecular vibrations to 
higher energy levels. The wavelength of IR absorption bands are characteristic of specific types of chemical bonds and 

5 IR spectroscopy is generally most useful for identification of organic and organometallic molecules. 

[0220] Near-infrared absorption spectroscopy (NIR) is the measurement of the wavelength and intensity of the 
absorption of near-infrared light by a sample. Near-infrared light spans the 800 nm - 2.5 pm (12,500 - 4000 cm'^) range 
and is energetic enough to excite overtones and combinations of molecular vibrations to higher energy levels. NIR 
spectroscopy is typically used for quantitative measurement of organic functional groups, especially 0*H^ N-H. and 

10 C=0. The components and design of NIR instrumentation are simitar to uv-vis absorption spectrometers. The tight 
source is usually a tungsten lamp and the detector is usually a PbS solid-state detector. Sample holders can be glass 
or quartz and typical solvents are CCI4 and CS2. The convenient instrumentation of NIR spectroscopy makes it suitable 
for on-line monitoring and process control. 

[0221] Uttraviotet and Visible Absorption Spectroscopy (uv-vis) spectroscopy is the measurement of the wave- 
15 length and intensity of absorption of near-ultraviolet and visible tight by a sample. Absorption in the vacuum UV occurs 
at 100-200 nm; (10^-50,000 cm'^) quartz UV at 200-350 nm; (50,000-28.570 cm*^) and visible at 350-800 nm; (28.570- 
12.500 cm***) and is described by the Beer-Lambert-Bouguet law. Uttravioiet and visible light are energetic enough to 
promote outer electrons to higher energy levels. UV-vis spectroscopy can be usually applied to molecules and inorganic 
ions or complexes in solution. The uv-vis spectra are limited by the broad features of the spectra. The light source is 
20 usually a hydrogen or deuterium lamp for uv measurements and a tungsten larrp for visible measurements. The wave- 
lengths of these continuous light sources are selected with a wavelength separator such as a prism or grating rrrono- 
chromator. Spectra are obtained by scanning the wavelength separator and quantitative measurements can be made 
from a spectrum or at a single wavelength. 

[0222] Mass spectrometers use the difference in the mass-to-charge ratio (m/z) of ionized atoms or molecules to 
25 separate them from each other. Mass spectrometry is therefore useful for quantitation of atoms or molecules and also 
for determining chemical and structural information albout molecules. Molecules have distinctive fragmentation patterns 
that provide structural information to identify compounds. The general operations of a mass spectrometer are as fol- 
lows. Gas-phase ions are created, the ions are separated in space or time based on their mass-to-charge ratio, and the 
quantity of ions of each mass-to-charge ratio is measured. The ion separation power of a mass spectrometer is 
30 described by the resolution, which is defined as R » m / delta m . where m is the ion mass and delta m is the difference 
in mass between two resolvat)le peaks In a mass spectrum For example, a mass spectrometer with a resolution of 
1000 can resolve an ion with a m/z of 100.0 from an ion with a m^ of 100.1 . 

[0223] in general, a mass spectrometer (MS) consists of an ion source, a mass-selective analyzer, and an ion 
detector. The magnetic-sector, quadrupole. and time-of-flight designs also require extraction and acceleration ion optics 

35 to transfer ions from the source region into the mass analyzer. The details of several mass analyzer designs (for mag- 
netic-sector MS, quadrupole MS or time-of-ftight MS) are discussed below. Single Focusing analyzers for magnetic- 
sector MS utilize a particle beam path of 180. 90. or 60 degrees. The various forces infhjencing the particle separate 
ions with different mass-to-charge ratios. With dout^le-focusing analyzers, an electrostatic analyzer is added in this type 
of instrument to separate particles with difference in kinetic energies. 

40 [0224] A quadnqDOle mass filter for quadrupole MS consists of four metal rods arranged in parallel. The applied volt- 
ages affect the trajectory of ions traveling down the flight path centered between the foija- rods. For given DC and AC 
voltages, only ions of a certain mass-to-charge ratio pass through the quadrupole filter and alt other ions are thrown out 
of their original path. A mass spectrum is obtained by monitoring the ions passing through the quadrupole fitter as the 
voltages on the rods are varied. 

45 [0225] A time-of-flight mass spectrometer uses the differences in transit time through a "drift region" to separate 
ions of different masses. It operates in a pulsed mode so ions must be produced in pulses and/or extracted in pulses. 
A pulsed electric field accelerates all ions into a field-free drift region with a kinetic energy of qV, where q is the ion 
charge and V is the applied voltage. Since the ion kinetic energy is 0.5 mV^. lighter ions have a higher velocity than 
heavier ions and reach the detector at the end of the drift region sooner. The output of an ion detector is displayed on 

so an oscilloscope as a furu:tion of time to produce the mass spectrum. 

[0226] The ion formation process is the starting point for mass spectrometric analyses. Chemical ionization is a 
method that employs a reagent ion to react with the analyte molecules (tags) to form ions by either a proton or hydride 
transfer. The reagent ions are produced by introducing a large excess of methane (relative to the tag) into an electron 
impact (El) Ion source. Electron collisions produce CH4'*^ and CHs'*^ which further react with methane to form CH5'*' and 

55 C2H5"*", Another method to ionize tags is by plasma and glow discharge. Plasma is a hot. partially-ionized gas that effec- 
tively excites and ionizes atoms. A glow discharge is a low-pressure plasma maintained between two electrodes. Elec- 
tron impact ionization employs an electron beam, usually generated from a tungsten filament, to ionize gas-phase 
atoms or molecules. An electron from the beam knocks an electron off analyte atoms or molecules to create ions. Elec- 



43 



EP0 992 511 A1 

trospray ionization utilizes a very fine needle and a series of skimmers. A sample solution is sprayed into the source 
chamber to form droplets. The droplets carry charge when the exit the capillary and as the solvent vaporizes the drop- 
lets disappear leaving highly charged anaiyte molecules. ESI particularly useful for large biological molecules that are 
difficult to vaporize or ionize. Fast-atom bomfc>ardment (FAB) utilizes a high-energy beam of neutral atoms, typically Xe 

5 or Ar, that strikes a solid sample causing desorption and ionization. It is used for large biological molecules that are dif- 
ficult to get into the gas phase. FAB causes little fragmentation and usually gives a large molecular ion peak, making it 
useful for molecular weight determination. The atomic beam is produced by accelerating ions from an ion source 
ttTough a charge-exchange cell. The ions pick up an electron in collisions with neutral atoms to form a beam of high 
energy atoms. Laser iornzation (LIMS) is a method in which a laser pulse ablates material from the surface of a sample 

10 and creates a microplasma that ionizes some of the sample constituents. Matrix-assisted laser desorption Ionization 
(MALDI) is a LIMS method of vaporizing and ionizing large biological molecules such as proteins or DNA fragments. 
The biological molecules are dispersed in a solid matrix such as nicotinic acid. A UV laser pulse ablates the matrix 
which carries some of the large molecules into the gas phase in an ionized form so they can be extracted into a mass 
spectrometer. Plasma-desorption ionization (PD) utilizes the decay of ^^Cf which produces two fission fragments that 

15 travel In opposite directions. One fragment strikes the sample knocking out 1-10 anaiyte ions. The other fragment 
strikes a detector and triggers the start of data acquisition. This ionization method is especially useful for large biologi- 
cal molecules. Resonance ionization (RIMS) is a mettiod in which one or more laser beams are tuned in resonance to 
transitions of a gas-phase atom or molecule to promote it in a stepwise fashion above its k>nizatk>n potential to create 
an ion. Secondary ionization (SIMS) utilizes an ion beam: such as ^He*,^^©*, or ""^Ar*; is focused onto the surface of 

20 a sample and sputters material into the gas phase. Spark source is a metiiod which ionizes analytes in solid samples 
by pulsing an electric current across two electrodes. 

[0227] A tag may become charged prior to, during or after cleavage from the molecule to whfch it is attached. Ioni- 
zation methods based on ion "desorption", the direct formation or emission off ions from solid or Tiquid surfaces have 
allowed increasing application to nonvolatile and thermally labile compounds. These methods eliminate tiie need for 

25 neutral molecule volatilization prior to ionization and generally minimize thermal degradation of the molecular species. 
These metfiods include f ieW desorption (Becky. Principles of Field Ionization and Field Desorption Mass Spectrometry, 
Pergamon. Oxford. 1977). plasma desorption (Sundqvist and Macfarlane. Mass Spectrom, Rev. 4:421, 1985), laser 
desorption (Karas and Hillenkamp. Anai. Chem. 60:2299, 1988: Karas et al.. Angew. Chem. 101 :805. 1989), fast par- 
ticle bombardment (e.g., fast atom bombardment. FAB. and secondary ion mass spectrometry, SIMS. Barba- et aL. 

30 Anal. Chem. 54:645A. 1982), and thermospray (TS) ionization (Vestal. Mass Spectrum. Rev. 2:447. 1983). Thermos- 
pray is broadly applied for the on-line combination with liquid chromatography The continuous flow FAB methods (Cap- 
rioli et al.. Anal. Chem. 55:2949, 1986) have also shown significant potential. A more complete listing of ionization/mass 
spectrometry combinations Is kin-trap mass spectrometry, electi^ospray bnization mass spectrometo-y. ion-spray mass 
spectrometry, liquid ionization mass spectrometry, atmospheric pressure ionization mass spectromefry. electi-on ioniza- 

35 tion mass spectrometry, metastaWe atom bombardment ionization mass spectrometry, fast atom boml^ard ionization 
mass spectrometry, MALDI mass spectrometry, . photo-ionization time-of -flight mass spectrometry, laser droplet mass 
spectrometry, MALDI-TOF mass spectrometry. APCI mass spectrometry, nano-spray mass spectrometry, nebulised 
spray Ionization mass spectrometry, chemk:al ionization mass spectrometry, resonance ionization mass spectrometry, 
secondary ionization mass spectrometry, thermospray mass spectrometry. 

40 [0228] The ionization mettiods amenable to nonvolatile biological compounds have overiapping ranges of applica- 
bility. Ionization efficiencies are highly dependent on matrix composition and compound type. Cun-ently available results 
indicate that the upper molecular mass for TS is about 8000 daltons (Jones and KroliK Rapid Comm. Mass Spectmm. 
7:67, 1987). Since TS is practiced mainly with quadrapole mass spectrometers, sensitivity typically suffers disporpor- 
tionately at higher mass-to-charge ratios (m/z). Time-of -flight (TOF) mass spectrometers are commercially available 

45 and possess the advantage that tiie m/z range is limited only by detector efficiency. Recentiy, two additional ionization 
methods have been introduced. These two methods are now referred to as mafrix-assisted laser desorption (MALDI. 
Karas and Hillenkamp. AnaL Chem. 60:2299. 1988; Karas et al.. Angew. Chem. 101:806. 1989) and electi-ospray ion- 
ization (ESI), Both methodologies have very high ionization efficiency {i.e., very high [molecular ions producedl/[mole- 
cules consumed]). Sensitivity, which defines tiie ultimate potential of the technkjue. is dependent on sample size, 

so quantity of ions, ftow rate, detection efficiency and actual ionization efficiency. 

[0229] Electo-ospray-MS is based on an idea first proposed in the 1960s (Dole et al.. J. Chem. Phys. 49:2240. 
1968). Electi-ospray ionization (ESI) is one means to produce charged molecules for analysis by mass specti-oscopy 
Briefly, electi-ospray ionization produces highly charged droplets by nebulizing liquids in a sfe-ong electrostatic field. The 
highly charged droplets, generally formed in a dry bath gas at atmospheric pressure, shrink by evaporation of neutral 

55 solvent until the charge r^ulston overcomes the cohesive forces, leading to a "Coulombic explosion". The exact mech- 
anism of ionization is conti-oversial and several groups have put forth hypotheses (Blades et al., Anal. Chem. 632109- 
14. 1991: Kebarie et al.. AnaL Chem. 65:A972-86. 1993: Fenn. J. Am. Soc. Mass. Spectrom. 4:524-35. 1993). Regard- 
less of the ultimate process of ion formation. ESI produces charged molecules from solution under mild conditions. 
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[0230] The ability to obtain useful mass spectral data on small amounts of an organic molecule relies on the effi- 
cient production of ions. The efficiency of ionization for ESI is related to the extent of positive charge associated with 
the molecule. Inproving ionization experimentally has usually involved using acidic conditions. Another method to 
improve ionization has been to use quaternary amines when possible (see Aet>ersold et a(.. Protein Science 1 :494-503, 

5 1992; Smith et al.. Anal. Chem. 5(?:436-41, 1988). 

[0231 J Electrospray ionization is described in more detail as follows. Electrospray ion production requires two steps: 
dispersal of highly charged droplets at near atmospheric pressure, followed by conditions to induce evaporation. A solu- 
tion of analyte molecules is passed through a needle that is kept at high electric potential. At the end of the needle, the 
solution disperses into a mist of small highly charged droplets containing the analyte molecules. The small droplets 

10 evaporate quickly and by a process of field desorption or residual evaporation, protonated protein molecules are 
released into the gas phase. An electrospray is generally produced by application of a high electric field to a small flow 
of liquid (generally 1-10 uUmin) from a capillary tube A potential difference of 3-6 kV is typically applied between the 
capillary and counter electrode located 0.2-2 cm away (where ions, charged clusters, and even charged droplets, 
depending on the extent of desolvation, may t>e sampled by the MS through a small orifice). The electric field results in 

IS charge accumulation on the liquid surface at the capillary terminus; thus the liquid flow rate, resistivity, and surface ten- 
sion are important factors in droplet production. The high electric field results in disruption of the liquid surface and for- 
mation of highly charged liquid droplets. Positively or negatively charged droplets can be produced depending upon the 
capillary bias. The negative ion mode requires the presence of an electron scavenger such as oxygen to inhibit electric 
cal discharge. 

20 [0232] A wide range of liquids can be sprayed electrostatically into a vacuum, or with the aid of a nebulizing agent. 
The use of only electric fields for nebulization leads to some practical restrictions on the range of liquid conductivity and 
dielectric constant Solution conductivity of less than 10'^ ohms is required at room temperature for a stable electro- 
spray at useful liqifid ftow rates oon^esponding to an aqueous electrolyte solution of < 10'^ M. In the mode found most 
useful for ESI-MS. an appropriate liquid flow rate results in dispersion of the liquid as a fine mist. A short distance from 

25 the capillary the droplet diameter is often quite uniform and on the order of 1 pm. Of particular importance is that the 
total electrospray ion current increases only slightly for higher liquid flow rates. There is evidence tat heating is useful 
for manpulating the electrospray. For example, slight heating allows aqueous solutions to be readily eiectrosprayed. 
presumat)ly due to the deaeased viscosity and surface tension. Both thermally-assisted and gas-nebulization-assisted 
electrosprays allow higher liquid flow rates to t>e used, but decrease the extent of droplet charging. The formation of 

30 molecular ions requires conditions effecting evaporation of the initial droplet population. This can be accomplished at 
higher pressures by a flow of dry gas at moderate temperatures (<G0'>C). by heating during transport through the inter- 
face, and (particularly in the case of ion trapping metfiods) by energetic ooliisfons at relatively low pressure. 
[0233] Although the detailed processes underlying ESI remain uncertain, the very small droplets produced by ESI 
appear to allow almost any species carrying a net charge in solution to be transferred to the gas phase after evaporation 

35 of residual solvent. Mass spectrometric detection then requires that ions have a tractable m/z range (<4000 daltons for 
quadrupole instruments) after desolvation. as well as to be produced and transmitted with sufficient efficiency. The wide 
range of solutes already found to be amenable to ESI-MS. and the lack of substantial dependence of iortization effi- 
ciency upon molecular weight, suggest a highly non-discriminating and broadly applicable fonization process. 
[0234] The electrospray ion "source" functions at near atmospheric pressure. The electrospray "source" is typically 

40 a metal or glass capillary incorporating a method for electrically biasing the liquid solution relative to a counter elec- 
trode. Solutions, typically water-methanol mixtures containing the analyte and often other additives such as acetic acid, 
flow to the capillary terminus. An ESI source has been described (Smith et al.. AnaL Chem, 52:885. 1990) which can 
accommodate essentially any solvent system. Typical flow rates for ESI are 1-10 uUmin. The principal requirement of 
an ESI-MS interface is to sanple and transport ions from the high pressure region into tiie MS as efficiently as possible. 

45 [0235] The efficiency of ESI can be very high, providing the basis for extremely sensitive measuremertts. which is 
usefol for the invention descrft>ed herein. Current instrumental performance can provide a total Ion current at the detec- 
tor of about 2 x 1 0*^ ^ A or atxxit 10^ counts/s for singly charged species. On the basis of the instrumental performance, 
concentrations of as low as 10"''° M or about 10'^^ mol/s of a singly charged species will give detectable ion current 
(about 10 counts/s) if the analyte is conpletely ionized. For example, tow attomole detection limits have been obtained 

so for quaternary amnrranium ions using an ESi interface with capillary zone electrophoresis (Smith et al., Anai. Chem. 
59:1230, 1988). For a compound of molecutar weight of 1000. the average number of charges is 1. the approximate 
number of charge states is 1 , peak width (m/z) is 1 and the maximum intensity (ion/s) is 1 x 10^^. 
[0236] Remarkably little sample is actually consumed in obtaining an ESI mass spectrum (Smith et al.. AnaL Chem. 
60:1948. 1988). Substantial gains might be also obtained by tiie use of array detectors with sector instruments, allowing 

55 simultaneous detection of portions of the spectrum. Since currentiy only about 10*^ of all ions formed by ESI are 
detected, attention to the factors limiting instrument performance may provide a basis for improved sensitivity. It will be 
evident to those in the art that tfie present invention conterrptates and accommodates for improvements in ionization 
and detection methodologies. 
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[0237] An interlace is preferably placed between ttie separation instrumentation {e.g.. gel)and the detector (e a 
mass spectrometer). The interface preferably has the following properties: (1) the ability to coHect the DNA fragment 
at discreet tme intervals, (2) concentrate the DNA fragments, (3) remove the DNA fragments from the electrophoresis 
^^.^ !"™®"- '^^"^ ^"^^ fragment. (5) separate the tag from the DNA fragment. (6) dispose 

of the DNA fragment, (7) place the tag in a volatile solution. (8) volatilize and ionize the tag. and (9) place « traiSoort 
the tag to an electrospray device that introduces the tag into mass spectrometer. 

10238] The interface also has the capability of "collecting- DNA fragments as they elute from the bottom of a gel 
The gel may be composed of a slab gel, a tubular gel. a capillary, etc. The DNA fragments can be collected by several 
methods The first method is that of use of an electric field wherein DNA fragments are collected onto or near an elec- 
trode. A second method is that wherein the DNA fragments are collected by flowing a stream of liquid past the bottom 
of a gel. Aspects of both methods can be combined wherein DNA collected into a flowing stream which can be later 
concentrated by use of an electric field. The end result is that DNA fragments are removed from the milieu under which 
the separation method was perfbrmed. That is. DNA fragments can be "dragged" from one solution type to another bv 
use of an electric field. ' 

C0239J Once the DNA fragments are in the appropriate solution (compatible with electrospray and mass spectrom- 
etry) the tag can be cleaved from the DNA fragment. The DNA fragment (or remnants thereof) can then be separated 
from the tag by the application of an electric field (preferably, the tag is of opposite charge of that of the DNA tag) The 
tag IS then introduced into the electrospray dei/ice by the use of an electric field or a flowing liquid 
10240] Ruorescent tags can be identified and quantitated most directly by their absorption and fluorescence emis- 
sion wavelengths arxl intensities. 

[0241] While a conventional spectrof luorometer is extremely flexible, providing continuous ranges of excitation and 
emission wavelengths (Igx. Isi. 's2). more specialized instruments such as flow cytometeis and laser-scanning micro- 
scopes require probes that are excitable at a single fixed wavelength. In contemporary instruments, tNs is usually the 
468 -nm bne of the argon laser. ' 

[0242] Ruorescence intensity per probe molecule is proportional to the product of e and QY The range of these 
parameters among fluorophores of current practica! importance is approximately 10.000 to 100 000 cm'^lwr' for £ and 
0.1 to 1 .0 for QY. When absorption is driven toward saturation by highHntensity illumination, the in-eversiWe destruction 
t *e e^rted f^uorophore (photobleaching) becomes the factor Hmiting fluorescence detectability. The practical inpact 
of photoWeaching depends on the fluorescent detection technique in question. 

102431 It will be evident to one in the art that a device (an interface) may be interposed between the separation and 
detection steps to permit the continuous operation of size separation and tag detection (in reel time). This unites the 
separation methodology and instrumentaUon with the detection methodology and instrumentation fomiing a single 
device. For example, an interface is interposed between a separation technique and detection by mass spectrometry 
or potentaostatic arnperometry. 

[0244] The function of the interface is primarily the release of the (e.^.. mass spectrometry) tag from analyte There 
are several representative implementations of the interface. The design of the interface is dependent on the cfwice of 
cleavaWe linkers. In the case of light or photOK^Ieavable linkers, an energy or photon source is required In the case of 
an acid-labile linker, a base-fabHe linker, or a disulfide linker, reagent addition is required within the interface In the case 
of heat-labile linkers, an energy heat source is required. Enzyme addition is required for an enzyme-sensitive linker 
such as a specific protease and a peptide linker, a nuclease and a DNA or RNA Bnker. a glycosylase HRP or phos- 
phatase and a linker which is unstable after cleavage (e.g.. simlHar to chemiluminescent substrates) Other character- 
istics of the interface include minimal band broadening, separatton of DNA from tags before injection into a mass 
spectrometer. Separatnn techniques include those based on electrophoretic methods and techniques affinitv tech- 
niques, size retention (dialysis). fBtration and the like. ' 
[02451 It is also possible to concentrate the tags (or nucleic actd-«nker-tag construct), capture eiectrophoretically 
and then release into altemate reagent stream whteh is compatible with the particular type of ionizatkNi method 
selected. The interface may also be capable of capturing the tags (or nucleic acid-linker-tag construct) on microbeads 
shooting the bead(s) into chamber and then prefbnning laser desorptionA/aporization. Also it is possible to extract ii^ 
«ow into alternate butter (e.g., from capillary electrophoresis butter into hydrophobic buffer across a permeable mem- 
brane). It may also be desirable in some uses to deliver tags into the mass spectrometer intennittentty which would 
comprise a further function of the interface. Another function of the interface is to deliver tags from multiple columns into 
a mass spectrometer, with a rotating time slot for each column. Also, it is possible to deliver tags from a single column 
into multiple MS detectors, separated by time, collect each set of tags for a few milliseconds, and then deliver to a mass 
spectrometer 

(02461 The following is a list of representative vendors for separation and detection technologies which may be 
used in the present invention. Hoefer Scientific Instmments (San Francisco, CA) manufactures electrophoresis equip- 
ment (Two Step . Poker Face " 10 for sequencing applicatrans. Pharmacia Biotech (Piscataway. NJ) manufactures 
electrophoresis equipment for DNA separations and sequencing (PhastSystem for PCR-SSCP analysis. MacroPhor 
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System for DNA sequencing). Perkin Elmer/Appiied Biosystems Division (ABI. Foster City. CA) manufactures semi- 
automated sequencers based on fluorescent<lyes (ABI373 and ABI377). Analyttcai Spectral Devices (Boulder. CO) 
manufactures UV spectrometers. Hitachi instruments (Tokyo. Japan) manufactures Atomic Absorption spectrometers. 
Fluorescence spectrometers. LC and GC Mass Spectrometers. NMR spectrometers, and UV-VIS Spectrometers. 

5 PerSeptive Biosystems (Framingham. MA) produces Mass Spectrometers (Voyager^ Elite). Bruker Instruments Inc. 
(Manning ParK MA) manufactures FTIR Spectrometers (Vector 22). FT-Raman Spectrometers. Time of Flight Mass 
Spectrometers (Reflex 11^**). Ion Trap Mass Spectrometer (Esquire^**) and a MaWi Mass Spectrometer, Analytical Tech- 
nology Inc. (ATI. Boston. MA) makes Capillary Gel Electrophoresis units. UV detectors, and Diode Array Detectors. Tel- 
edyne Electronic Technologies (Mountain View. CA) manufactures an Ion Trap Mass Spectrometer (3DQ Discovery^"* 

10 and the 3DQ Apogee^. Perkin Elmer/Applied Biosystems Division (Foster City. CA) manufactures a Sctex Mass Spec- 
trometer (triple quadrupole LC/MS/MS. the AP1 100/300) which is compatitjie with dectrospray. Hewlett-Packard (Santa 
Clara. CA) produces Mass Selective Detectors (HP 5972 A). MALDI-TOF Mass Spectrometers (HP G2025A), Diode 
Array Detectors. CE units, HPLC units (HP 1090) as well as UV Spectrometers. FmnlgM Corporation (San Jose, CA) 
manufactures mass spectrometers (magnetic sector (MAT 95 S™). quadrapole spectrometers (MAT 95 SQ^) and four 

IS other related mass spectrometers). Rainin (Emeryville, CA) manufactures HPLC instruments. 

[0247} The methods and compositions described herein permit the use of cleaved tags to serve as maps to partic- 
ular sample type and nucleotide identity. At the begkining of each sequencing method, a particular (selected) primer is 
assigned a particular unique tag. The tags map to either a sample type, a dideoxy terminator type (in the case of a 
Sanger sequencing reaction) or preferably both. Specrficaily. the tag maps to a primer type which in turn maps to a vec- 

20 tor type which in turn maps to a sample identity. The tag may also may map to a dideoxy terminator type (ddTTP. ddCTP. 
ddOTP. ddATP) by reference into which dideoxynudeotide reaction the tagged primer is placed. The sequencing reac- 
tion is then performed and the resulting fragments are sequentially separated by size in time. 

[0248] The tags are cleaved from the fragments in a temporal frame and measured and recorded in a terrporal 
frame. The sequence is constructed by comparing the tag map to the temporal frame. That is. all tag identities are 
25 recorded in time after the sizing step and related become related to one another in a temporal frame. The sizing step 
separates the mjdeic add fragments by a one nucleotide increment and hence the related tag identities are separated 
by a one ruicleotide increment By foreknowledge of the dideoxy-terminator or nucleotide map and sample type, the 
sequence is readily deduced in a linear fashion. 

[0249] The following examples are offered by way of illustration, and not by way of limitation. 
30 [0250] Unless othenwise stated, chemicals as used in the examples may be obtained from Aldrich Chemical Com- 
pany. Milwaukee. Wl. The following at)breviations. with the indicated meanings, are used herein: 

ANP = 3-(Fmoc-amino)-3-(2-nitrophenyl)propionic acid 

NBA = 4-(Fmoc-aminomethyl)-3-nitrobenzoic acid 
35 HATU = O-7-azabenzotriazol- 1 -yl-N.N, N*. N'-tetramethyluronium hexaf luorophosphate 

DIEA = diisopropylethylamtne 

MCT = monochlorotriazine 

NMM = 4HTiethyimorphollne 

NMP = N-methylpyrrolidone 
40 ACT357 = ACT357 peptide syntiiesizer from Advanced ChemTech, Inc.. Louisville. KY 

ACT s Advanced ChemTech. Inc.. Louisville, KY 

NovaBtochem = CalBiochem-NovaBiochem International. San Diego, CA 
TFA = Trrfluoroacetic add 
Tfa = Trif luoroacetyl 
45 iNIP s N-Methytisonipecotic acid 
Tfp « Tetraf kiorophenyl 
DIAEA = 2-(Diisopropytamino)ethylamine 
MCT B monochtorotriazene 

5*-AH-ODN s 5'-amirK3hexyl-tailed oligodeoxynucleotide 
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EXAMPLES 

PREPARATION OF ACID LABILE LINKERS FOR USE IN CLEAVABLEHVIW-IDENTIFIER SEQUENCING 
Ll^'Z^p iZT ^"^*"^^ ^"^"^ ^ ChemicallY CiPqvabIg M^ss SpP(nro.coov T.n. tn ■ ih^r ^ t*^ T.n. with r^r. 
[0251] Figure 1 shows the reaction scheme. 

StpA. TentaGel S AC resin (compound 11; available from ACT; 1 eq.) is suspended with DMF in the collection ves- 
sel of the ACT357 peptide synthesizer (ACT). Compound I (3 eq.). HATU (3 eq.) and DIEA (7.5 eq ) in DMF are 
added and the collection vessel shaken for 1 hr. The solvent is removed and the resin washed with NMP (2X) 
MeOH {2X). and DMF (2X). The coupling of I to the resin and the wash steps are repeated, to give compound III. ' 

^mB. The resin (compound III) is mixed with 25% piperidine in DMF and shaken lor 5 min. "me resin is filtered 

mSJ^^^I'^Smp « JT IJ;^ "l^^"^ to, W min. The solvent is removed, the resin washed with NMP 
(2X). MeOH (2X). and DMF (2X). and used directly In step C. 

Sl§a£. The deprotected resin from step B is suspended in DMF and to it is added an FMOC-protected amino acid 
containing amine functonality in its side chain (compound IV, e.g. alpha-N-FMOC-3-<3-pyridyO-alanine available 
from Synthetech. Albany. OR; 3 eq.). HATU (3 eq.). and DIEA (7.5 eq.) in DMF The vesSTis shLn S 
soh/ent is removed and the resin washed with NMP (2X). MeOH (2X). and DMF (2X). The couping of IV to the resin 
and the wash steps are repeated, to give compound V 

StSLB The resin (compound V) is treated with piperidine as desaibed in step B to remove the FMOC arouD The 
deprotected resin is then divided equally by the ACT357 from the collection vessel into 16 reaction vessels. 

a§ILi. TTie 16 aliquots of deprotected resin from step D are suspended in DMF To each reaction vessel is added 
the appropr«te carboxylic acid Vh.,6 (Ri-ieCOaH; 3 eq.). HATU (3 eq ). and DIEA (7.5 eq.) in DMF The vessels 
^e Shaken for 1 hr. The solvent is removed and the aliquote of resin washed with NMP (2X). MeOH (2X) and DMF 
(2X). The coupling of Vl,.i6 to the aliquots of resin and the wash steps are repeated, to give conpounds Wll^.^^. 

SSf *4'?"°^u°'if*'" (compounds Vll,.,6) are washed wrth CH2CI2 (3X). To each of the reaction vessels is 
added 1 A> TFA in CH2CI2 and the vessels shaken for 30 min. The solvent is filtered from the reaction vessels into 
individual tubes. The ahquots of resin are washed wtth CHaQz (2X) and MeOH (2X) and the filtrates combined into 
the individual tubes. The individual tid)es are evaporated in vacuo, providing compounds Vllli.ie. 

Simla. Each of the free cartwxylic acids Vllh.ie is dissolved in DMF To each solution is added pyridine (1 05 ea 1 
flowed by pentafluorophenyl trifluoroacetate (1.1 eq ). The mixtures are stirred tor 45 min. at room teinperatt/re 
TTie solufions are diluted with EtOAc. washed with 1 M aq. citric acM {3X) and 5% aq. NaHCOj (3X). dried ove^ 
Na2S04. filtered, and evaporated in vacuo, providing confounds IXi.ie- 

of PentafluorgphyiYl E^er? Qf ChffmicallY Cleavable Mas.s So^drn^^ t^ q s. to Lih^ta T^ ^ ^ n.^r. 
POXVl ACiQ Termini 

[0252] Rgure 2 shows the reaction scheme. 

Sl^. 4-(Hydroxymethyi)phenoxybutync acid (compound I; 1 eq.) is combined with DIEA (2.1 eq ) and allyl bro- 
mide (2. 1 eq.) in CHCI3 and heated to reflux for 2 hr. The mixture is diluted with EtOAc. washed with 1 N HCI (2X) 
pH 9,5 carbonate buffer (2X). and brine (1X). dried over Na2S04. and evaporated in vacuo to give the allyt ester of 
compound L 

^gKLB. The allyl ester of compound I from step A (1.75 eq.) is combined in CH2CI2 with an FMOC-protected amino 
aad containing amine functionality in its sfoe chain (compound II, e.g. alpha-N-FMOC-3-(3-pyridyl)-alanine avail- 
able from Synthetech. Albany, OR; 1 eq.). N-methylmorpholine (2.5 eq.), and HATU (1.1 eq.), and stin-ed at room 
temperature for 4 hr. The mixture is diluted with CH2CI2. vrashed with 1 M ag. citric acid (2X) water (IX) and 5% 
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aq. NaHCOs (2X). dried over Na2S04, and e/aporated in vacuo. Compound 111 is isolated by flash chromatography 

(CHgClg-^ EtOAc). 

Step C. Compound III is dissolved in CH2CI2. Pd(PPh3)4 (0.07 eq.) and N-methylaniline (2 eq.) are added, and the 
mixture stirred at room temperature for 4 hr. The mixture is cfiluted with CH2CI2. washed with 1 M aq. citric add (2X) 
and water (IX), dried over Na2S04. and evaporated In vacuo. Compound IV is isolated by flash chromatography 

(CH2Cl2-^ EtOAc + HOAc). 

Step P. TentaGel S AC resin (compound V; 1 eq.) is suspended with DMF in the collection vessel of the ACT357 
peptide synthesizer (Advanced ChemTech Inc. (ACT), Louisville, KY). Compound IV (3 eq.), HATU (3 eq.) and 
DIEA (7.5 eq.) in DMF are added and the collection vessel shaken for 1 hr The solvent is removed and the resin 
washed with NMP (2X). MeOH (2X). and DMF (2X). The coupling of IV to the resin and the wash steps are 
repeated, to give compotnd VI. 

Step E . The resin (compound VI) is mixed with 25% piperidine in DMF and shaken for 5 min. The resin is filtered, 
then mixed with 25% piperi(£ne In DMF and shaken for 10 min. The solvent is removed and the resin washed with 
NMP (2X). MeOH (2X). and DMF (2X). The deprotected resin is then divided equally by the ACT357 from the col- 
lection vessel into 16 reaction vessels. 

Step F. The 16 aliquots of deprotected resin from step E are suspended in DMF. To each reaction vessel is added 
the appropriate carboxylic acid Vll^.^g {R1-16CO2H; 3 eq.), HATU (3 eq.). and DIEA (7.5 eq.) in DMR The vessels 
are shaken for 1 hr The solvent is removed and the aliquots of resin washed with NMP (2X). MeOH (2X), and DMF 
(2X). The coupling of Vll^.^s to the aliquots of resin and the wash steps are repeated, to give compounds Vlll-f.-je. 

Step G. The aliquots of resin (compounds Vlll^.^e) are washed with CH2CI2 (3X). To each of the reaction vessels 
is added 1% TFA in CH2CI2 and the vessels shaken for 30 min. The solvent is filtered from the reaction vessels into 
individual tubes. The aliquots of resin are washed with CH2CI2 (2X) and MeOH (2X) and the filtrates combined into 
the individual tubes. The individual tubes are evaporated in vacuo, providing compounds IXi.ie- 

Step H. Each of the free cartxixylic adds IX^.^e is dissolved in DMF. To each solution is added pyridine (1.05 eq.), 
followed by pentafluorophenyl trif luoroacetate (1 . 1 eq.). The mixtures are stirred for 45 min. at room temperature. 
The solutions are diluted with EtOAc. washed with 1 M aq. citric acid (3X) and 5% aq. NaHCOs (3X). dried over 
Na2S04> filtered, and evaporated in vacuo, providing compounds X^.^e- 

EXAMPLE 2 

DEMONSTIWION OF PHOTOLYTIC CLEAVAGE OF T-L-X 

[0253] A T-L-X compound as prepared in Example 1 3 was irradiated with near-UV light for 7 mm at room tempera- 
ture. A Rayonett fluorescence UV lamp (Southern New England Ultraviolet Co.. Middletown, CT) with an emission peak 
at 350 nm is used as a source of UV light. The lamp is placed at a 15-cm distance from the Petri dishes with samples. 
SDS gel electrophoresis shows that >85% of the conjugate is cleaved under these conditions. 

EXAMPLE 3 

PREPARATION OF FLUORESCENT LABELED PRIMERS AND DEMONSTRATION OF CLEAVAGE OF FLUORO- 
PHORE 

Synthesis and Purification of Oligonucleotides 

[0254] The oligonucleotides (ODNs) are prepared on automated DNA synthesizers using the standard phospho- 
ranuldfte chemistry supplied by the vendor, or the H-phosphonate chemistry (Glenn Research Steriing. VA). Appropri- 
ately blocked dA. dG. dC. and T phosphoramidites are commercially available in these forms, and synthetic nucleosides 
may readily be converted to the appropriate form. The oligonucleotides are prepared using the standard phosphora- 
midite supplied by the vendor, or the H-phosphonate chemistry. Oligonucleotides are purified by adaptations of stand- 
ard methods. Oligonucleotides with 5'-trityl groups are chromatographed on HPLC using a 12 micrometer. 300 # Rainin 
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(Emeryville. CA) Dynamax C-8 4.2x250 mm reverse phase column using a gradient of 15% to 55% MeCN in 0 1 N 
EtaNH^OAc . pH 7.0, over 20 min. When detritylation is performed, the oligonucleotides are further purified by gel exclu- 
sion chromatography. Analytical checks tor the quality of the ofigonucleotides are conducted with a PRP-column fAD- 
tech, Deerfield, \L) at alkaline pH and by PAGE. 

[02551 Preparation of 2,4,6-trichiorotriazine derived oligonucleotides: 10 to 1000 pg of 5'-terminal amine linked oB- 
gonucleotide are reacted writh an excess recrystalhzed cyanuric chloride in 10% n-methyl-pyrrolidone in alkaline (pH 8 3 
to 8.5 preferably) buffer at 19"C to 25-C for 30 to 120 minutes. The final reaction conditions consist of 0 15 M sodium 
borate at pH 8.3, 2 mg/ml recrystallired cyanuric chloride and 500 ug/ml respective oligonucleotide. The unreacted cya- 
nuric chloride is removed by size exclusion chromatography on a G-50 Sephadex (Phamiacia. Piscataway NJ) column 
102561 The activated purified oligonucleotide is then reacted wwth a 1 0O-foW molar excess of cystamine in 0 15 M 
sodium borate at pH 8.3 for 1 hour at room temperature. The unreacted cystamine is removed by size exclusion chro- 
matography on a G-50 Sephadex column. The derived ODNs are then reacted with amine-reaclive fluofochromes The 
derived ODN preparation is divided into 3 portions and each portion is reacted with either (a) 20^old molar exc^ of 
Texas Red sulfenyl chloride (Molecular Probes. Eugene. OR), with (b) 20-told molar excess of Ussamine sulfonyl chlo- 
nde (Molecular Probes. Eugene. OR), (c) 20-foW molar excess of fluorescein isothlocyanate. The final reactton condi- 
tions consist of 0.15 M sodium borate at pH 8.3 for 1 hour at room temperature. The unreacted fluorochromes are 
removed by size exclusion chromatography on a G-50 Sephadex column. 

[0257] To cleave the fluorochrome from the oligonudeotMe. the ODNs are adjusted to 1 x 1 0'^ molar and then dilu- 
tions are made (12. 3-fold dilutions) in TE (TE is 0.01 M Tris, pH 7.0, 5 mM EDTA). To 100 fil volumes of ODNs 25 ul of 
0.01 M dithiothratol (DTT) is added. To an identical set of controls no DDT is added. The mixture is incubated for 15 
minutes at room temperature. Fluorescence is measured in a black microtiter plate. The solution is removed from the 
incubation tubes (150 micrdrters) and placed in a black microtiter plate (Dynatek Laboratories ChanBlly, VA) The 
plates are then read direcHy using a Ruoroskan 11 fluorometer (Flow Laboratories, McLean. VA) using an" excitation 
wavelength of 495 nm and monitoring emissfon at 520 nm for fluorescein, using an excitation wavelength of 591 nm and 
monitonng emission at 612 nm for Texas Red, and using an exdtatfon wavelength of 570 nm and monitorina emission 
at 590 nm for lissanvne. 



Moles of Fluorochrome 


RRJ non-cleaved 


RFU deaved 


RFU free 


I.OxlO^M 


6.4 


1200 


1345 


3.3x106m 


2.4 


451 


456 


1.1 x10®M 


0.9 


135 


130 


3.7x10^M 


0.3 


44 


48 


1.2x10^M 


0.12 


15.3 


16.0 


4,1 X lO^M 


0.14 


4.9 


5.1 


1.4x10®M 


0.13 


2.5 


2.8 


4.5x10®M 


0.12 


0.8 


0.9 



[0258] The data indicate that there is atxiut a 200-fokl increase in relative fluorescence when the fluorochrome is 
clecn/ed from the ODN. 

EXAMPLE 4 

PREPARATION OF TAGGED M13 SEQUENCE PRIMERS AND DEMONSTRATION OF CLEAVAGE OF TAGS 

[0259] Preparation of 2,4,6-trichtorotriazine derived oligonucleotides: 1000 ^g of 5'-terminal amine linked oligonu- 
cleotide (5 -hexylamine-TGTAAAACGACGGCCAGT-3") (Seq. ID No. 1) are reacted with an excess recrystallized cya- 
nuric chloride in 10% n-methyl-pyrroBdone alkaline (pH 8.3 to 8.5 preferably) buHer at 19 to 25- C for 30 to 120 minutes 
The final reaction conditions consist of 0.15 M sodium borate at pH 8.3. 2 mg/ml reaystallized cyanuric chloride and 
500 ugMil respective oligonucleotide. The unreacted cyanuric chloride is removed by size exclusion chromatograohv on 
a G-50 Sephadex column. =» ^ 

[0260] The activated purified oligonucleotide is then reacted with a 1 00-told molar excess of cystamine in 0.15 M 
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allowed to polymerize for at least 30 minutes. Prior to loading, the tape around the bottom of the gel and the well-form- 
ing comb is removed. A vertical electrophoresis apparatus is then assembled by clamping the upper and lower buffer 
chambers to the gel plates, and adding 1 X MTBE electrophoresis buffer to the chambers. Sample wells are flushed with 
a syringe containing running buffer, and immediately prior to loading eadi sample, the well is flushed with running tjuffer 
5 using gel loading tips to remove urea. One to two miaoliters of sample is loaded into each well using a Pipetteman 
(Rainin. Emeryville. CA) with geMoading tips, and then electrophoresed according the following guidelines (during elec- 
trophoresis, the gel is cooled with a tan): 



10 





termination reaction pdyacryla- 
mide gel 


electrophoresis conditions 


short 
long 
long ' 


5%. 0.15 mm x 50 cm x 20 cm 
4%, 0.15 mm x 70 cm x 20 cm 
4%. 0.15 mm x 70 cm x 20 cm 


2.25 hours at 22 mA 
8-9 hours at 15 mA 
20-24 hours at 15 mA 



[0307] Each base-specific sequencing reaction terminated (with the short termination) mix is loaded onto a 0.15 
20 mm X 50 cm x 20 cm denaturing 5% polyacrylamide gel; reactions terminated with the long termination mix typically are 

dmdexi in half and loaded onto two 0.15 mm x 70 cm x 20 cm denaturing 4% polyacrylamide gels. 

[0308] After electrophoresis, buffer is removed from the wells, the tape is removed, and the gel plates separated. 

The gel is transferred to a 40 cm x 20 cm sheet of 3MM Whatman paper, covered with plastic wrap, and dried on a 

Hoefer (San Francisco. CA) gel dryer for 25 minutes at 80"C. The dried gel is exposed to Kodak (New Haven. CT) XRP- 
25 1 film. Depending on the intensity of the signal and whether the radiolabel is ^P or ^^S. exposure times vary from 4 

hours to several days. Afier exposure, films are developed by processing in developer and fixer solutions, rinsed with 

water, and air dried. The autoradiogram is then placed on a light-box. the sequence is manually read, and the data 

typed into a computer. 

[0309] Taq-polymerase catalyzed cycle sequencing using labeled pn'mers. Each base-specific cyde sequencing 

30 reaction routinely included approximately 1 00 or 200 ng isolated single-stranded DNA for A and C or G and T reactions, 
respectively. Double-stranded cycle sequencing reactions similarly contained approximately 200 or 400 ng of plasmid 
DNA isolated using either tfie standard alkaline lysis or the diatomaceous earth-nxxjified alkaline lysis procedures. All 
reagents except template DNA are added in one pipetting step from a premix of previously altquoted stock solutions 
stored at -20**C. Reaction premixes are prepared by combining reaction buffer with the base-specific nucleotide mixes. 

35 Prior to use. the base-specific reaction premixes are thawed and combined with diluted Taq DNA polymerase and the 
individual end-labeled universal primers to yield the final reaction mixes. Once the above mixes are prepared, four aliq- 
uots of single or double-stranded DNA are pipetted into tiie bottom of each 0.2 mi thin-wailed reaction tube, correspond- 
ing to the A. C. G. and T reactions, and then an aliquot of the respective reaction mixes is added to the side off each 
tube. These tubes are part of a 96-tube/retainer set tray in a microtiter plate format, which fits into a PerWn Elmer Cetus 

40 Cycler 9600 (Foster City, CA). Strrp caps are sealed onto the tube/retainer set and the plate Is centrifuged briefly. The 
plate then is placed in tiie cycler whose heat block had been preheated to SS^'C. and the cycling program immediately 
started. The cycling protocol consists of 15-30 cycles of: 95*'C denaturation; 55**C annealing; 72*C extension; 95'»C 
denaturation: 72''C extension; 95*C denaturation. and 72**C extension, linked to a 4*»C final soak f He. 
[0310] At this stage, the reactions may be frozen and stored at -20**C for up to several days. Prior to pooling and 

45 precipitation, the plate is centrifuged briefly to reclaim condensation. The primer and base-specific reactions are pooled 
into ethanol. and the precipitated DNA is collected by centrifugation and dried. These sequencing reactions could be 
stored for several days at -20°C. 

[031 1] The protocol for the sequencing reactions is as follows. For A and C reactions. 1 pi. and for G and T reac- 
tions, 2 ^l of each DNA sample (100 ng/ul for Ml 3 templates and 200 ng/ul for pUC tenrtplates) are pipetted into the 
so bottom of tiie 0.2 ml thin-walled reaction tubes. AmpliTaq polymerase (N801 -0060) is from PerWn-Elmer Cetus (Foster 
City. CA). 

[0312] A mix of 30 \i\ AmpliTaq (5U/^I). 30 ^1 5X Taq reaction buffer. 130 pi ddH20. and 190 ^1 diluted Taq for 24 
clones is prepared. 

[0313] A. C. G. and T base specific mixes are prepared by adding base-specific primer and diluted Taq to each of 
55 the base specific nucleotideAsuffer premixes: 

A.C/G.T 

60/120 ^1 5X Taq cycle sequencing mix 
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30A60 111 diluted Taq polymerase 

30/60 mI respective fluorescent end-l^eled primer 
120/240 fil 

B. Taq-POlYmerase catalyzed cvde sequencing ■■s ing MW-iri«>ntif i ef. labeled terminator rMrth ^nff 

f?^^^^ ^ 2"® ^'^^ sequencing is that when primers are used the reaction conditions are such tat 

the nested fragment set distritjution is highly dependent upon the template concentration in the reaction mix It has been 
recently observed tat the nested fragment set distritjution for the DNA cycle sequencing reactions using the labeled ter- 
minators IS much less sensitive to DNA concentration than that obtained with the labeled primer reactions as described 
above. In additioa the terminator reactions require only one reaction tube per template while the labeled primer reac- 
tions require one reaction tube for each of the four terminators. The protocol described below is easily intertaced with 
the 96 well template isolation and 96 well reaction clean-up procedures also described herein 
[031 5J Place 0.5 ^g of single-stranded or 1 jig of double-stranded DNA In 0.2 ml PGR tubes Add 1 iil (for sinale 
sanded templates) or 4 ,^1 (for douWe-stranded templates) of 0.8 tM primer and 9.5 |il of ABI supplied prentix to each 
tube, and bring the final volume to 20 with ddHjO. Centrifuge briefly and cyde as usual using the terminator program 
as described by the manufacturer (ie.. preheat at 96°C followed by 25 cycles of 96°C for 15 seconds SO'C for 1 sec- 
ond. 60»C for 4 minutes, and then link to a 4-C hold). Proceed with the spin column purification using ^her the Centri- 
5ep columns (Amicon. Beverly. MA) or G-50 microtiter plate procedures given below. 

C. Terminator Reaction Clean-Uovia C entri.Sflnnnii.mp e 

[031 6] A column is prepared by gently tapping the column to cause the gel material to settle to the bottom of the 
column. The column stopper is removed and 0.75 ml dHgO is added. Stopper the column and invert if several times to 
mix^ Allow the gel to hydrate for at least 30 minutes at room temperature. Columns can be stored for a few days at 4°C 
Allow columns that have been stored at 40C to warm to room temperature before use. Remove any air bubbles by invert^ 
ing the column and allowing the gel to settle. Remove the upper-end cap first and then remove the lower-end cao Allow 
the column to drain completely, by gravity. (Note: If flow does not begin immedately apply gentle pressure to thecolumn 
with a pipet bulb.) Insert frie column into the wash tube provided. Spin in a variable-speed microcentrifuge at 1300 xg 
for 2 minutes to remove the fluid. Remove the column from the wash tube and insert it into a sample collection tube 
Carefully remove the reaction mixture (20 nl) and load it on top of the gel material. If the samples were incubated in a 
cycling instmment that required overlaying with oil. carefully remove the reaction from beneath the oil Avoid picking uo 
oil with the sample, although small amounts of oil (<1 nl) in the sample wOl not affect results. Oil at the end of the ptoet 
tip containing the sample can be removed by touching the tip carefuUy on a clean surface (e.g.. the reaction tube) Use 
each column only once. Spin in a variable-speed microcentrifuge with a fixed angle rotor, placing the column in the 
same orientation as it was in for the first spin. Dry the sample in a vacuum centrifuge. Do not apply heat or over drv If 
desired, reactions can be precipitated with ethanol. « « y. 

D. Terminator Reaction Clean-Un via Rfphadex Ci-fV} F jHed lUlicfotitty Format FUtar Platte 

[0317] Sephadex (Pharmacia. Piscataway, NJ) settles out; therefore, you must resuspend before adding to the 
p^ate and also after filing every 8 to 10 weBs. Add 400 M of mixed Sephadex G-50 to each well of microtiter filter plate 
Place microtiter filter plate on top of a microtiter plate to collect water and tape sides so they do not fly apart during cen- 
trifugation. Spin at 1500 rpm for 2 minutes. Discard water that has been collected in the microtiter plate Place the 

45 microtiter filter plate on top of a microtiter plate to collect water and tape sides so they do not fly apart during centrifu- 
gation. Add an additional 100-200 ^1 of Sephadex G-50 to fill the microtfter plate wells. Spin at 1500 rpm for 2 minutes 
Discard water that has been collected in the microtHer plate. Place the microtiter filter plate on top of a microtiter plate 
with tubes to collect water and tape sides so they do not fly apart during centrifugation. Add 20 ^1 terminator reaction 
to each Sephadex G-SO containing weUs. Spin at 1500 rpm tOr 2 minutes. Place the collected effluent in a Soeed-Vac 

so for approximately 1-2 hours. ^ 

[031 8] Sequenase"^ (UBS. Cleveland OH) catalyzed sequencing with labeled terminators. Single-stranded termi- 
nator reactions require approximately 2 ng of phenol extracted M13-based template DNA. The DNA is denatured and 
the primer annealed by incubating DNA. primer, and bufler at SSoC. After the reaction cooled to room temperature 
alpha-thio-deoxynucleotides. labeled terminators, and diluted Sequenase TM DNA polymerase are added and the mix- 
ture IS mciAated at 37"C. The reaction is stopped by adding ammonium acetate and ethanol. and the DNA fragments 
are precipitated and dried. To aid in the removal of unincorporated terminators, the DNA pellet is rinsed twice with eth- 
anol. The dried sequencing reactions could be stored up to several days at ■ZO'C. 

[031 91 Double-stranded terminator reactions required apprcaimately 5 (ig of diatomaceous earth modif ied-alkaline 
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lysis midi-prep purified plasmid DNA. The double-stranded DNA is denatured by incubating the DNA in sodium hydrox- 
ide at eS^'C. and after incubation, primer is added and the reaction is neutralized by adding an acid-buffer. Reaction 
buffer, alpha-thio-deoxynucleotides, labeled dye-terminators, and diluted Sequenase TM DNA polymerase then are 
added and the reaction is incubated at 37*C. Ammonium acetate is added to stop the reaction and the DNA fragments 
are precipitated, rinsed, dried, and stored. 
[0320] For Single-stranded reactions: 
Add the following to a 1.5 ml microcentrifuge tube: 

4 ^1 ss DNA (2 Mg) 

4 111 0.8 primer 
2^1 1 0x MOPS buffer 

2 ^1 1 0x Mn^ Visocitrate buffer 
12 (il 

[0321] To denature the DNA and anneal the primer, incubate the reaction at 65*0-70*0 for 5 minutes. Allow the 
reaction to cool at room temperature for 15 minutes, and then briefly centrifuge to reclaim condensation. To each reac- 
tion, add the fbilowing reagents and incubate for 10 minutes at 37°0. 

7mI ABI terminator mix (Oatalogue No. 401489) 

2 mI diluted Sequenase TM (3.25 U/\jJ) 
1 2 mM a-S dNTPs 

22 ^l 

[0322] The undiluted Sequenase TM (Catalogue No. 70775, United States Biochemicals. Cleveland. OH) is 1 3 U/kI 
and is diluted 1 :4 with USB dilution buffer prior to use. Add 20 pi 9.5 M ammonium acetate arxJ 100 jil 95% ethanol to 
stop the reaction and mix. 

[0323] Precipitate the DNA in an ice-water bath for 10 minutes. Centrifuge for 15 minutes at 10,000 xg in a micro- 
centrifuge at 4**C. Carefully decant the supernatant and rinse the pellet by adding 300 of 70-80% ethanol. Mix and 
centrifuge again for 15 minutes and carefully decant the supernatant. 

[0324] Repeat the rinse step to insure efficient removal of the unincorporated terminators. Dry the DNA for 5-10 

minutes (or until dry) in the Speed-Vac. and store the dried reactions at •20*»C. 

[0325] For double-stranded reactions: 

Add the following to a 1 .5 ml microcentrifuge tube: 

5 Ml ds DNA (5 ^g) 
4 Ml I NNaOH 

3 Ml ddHzO 

[0326] Incubate the reaction at 65**C-70**C for 5 minutes, and then briefly centrifuge to reclaim condensation. Add 
the following reagents to each reaction, vortex, and briefly centrifuge: 

3 M^ 8 mM primer 
9 Mt ddH20 

4 Ml MOPS-Acid buffer 

[0327] To each reaction, add the following reagents and iricubate for 10 minutes at 37**0. 

4 mI 1 ox Mn^^yisocitrate buffer 

6 mI ABI terminator mi 

2 Mi diluted Sequenase TM (3.25 U/mO 
1 Mi 2 mM [alpha]-S-dNTPs 
22 Ml 

[0328] The undiluted SEQUENASE^" from United States Biochemicals is 13 U/mI and should be diluted 1:4 with 
USB dilution buffer prior to use. Add 60 mI 8 M ammonium acetate and 300 mI 95% ethanol to stop the reaction and vor- 
tex. Precipitate the DNA in an ice-water bath for 1 0 minutes. Centrifuge for 1 5 minutes at 1 0.000 xg in a microcentrifuge 
at 4^0. Carefully decant the sipernatant. and rinse the pellet by adding 300 mI of 80% ethanol. Mix the sample and cen- 
trifuge again for 15 minutes, and carefully decant the supernatant Repeat the rinse step to insure efficient removal of 
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the unincorporated terminators. Dry the DNA tor 5-10 minutes (or until dry) in the Speed-Vac. 

P'^^^^'^'" - Pre-g'ggtrophorftsis. sample loading, electrophorpste . data eollPrtinn gpd analysis nn th» 
ABI 373A DNA sequencer 

[0329] Polyacrylamide gels for DNA sequencing are prepared as described abwe. except that the gel mix isfiltered 
pnor to polymerization. Glass plates are carefully cleaned with hot vwater. distilled water, and ethanol to remove potential 
fluorescent contaminants prior to taping. Denaturing 6% polyacrylamide gels are poured into 0.3 mm x 89 cm x 52 cm 
taped plates and fitted with a 36 well comb. After polymerization, the tape and the comb are removed from the gel and 
the outer surfaces of the glass plates are deaned with hot water, and rinsed with distilled water and ethanol The aei is 
assembled into an ABI sequencer, and the checked by laser-scanning. If baseline alterations are observed on the ABI- 
associated Macintosh computer display, the plates are redeaned. Subsequently, the buffer wells are attached electro- 
phoresis buffer IS added, and the gel is pre-electrophoresed for 10-30 minutes at 30 W. Prior to sample loading the 
pooled and dried reaction products are resuspended in formamide/EDTA loading buffer by vortexing and then heated 
at 90"C. A sample sheet is created within the ABI data collection software on the Macintosh computer which indicates 
the number of samples loaded and the fluorescent-labeled mobility file to use for sequence data processing After 
cleaning the sample wells with a syringe, the odd-numbered sequencing reactions are loaded into the respective wells 
using a micropipettor equipped with a flat-tipped gel-loading tip. The gel is then electrophoresed for 5 minutes before 
the wells are cleaned again and the even numbered samples are loaded. The f iHer wheel used for dye-primers and dve- 
termmators is specified on the ABI 373A CPU. Typically electrophoresis and data collection are for 10 hours at 30W on 
ttw ABI 373A that is fitted with a heat-distributing aluminum plate. After data collection, an image file is created by the 
ABI software that relates the fluorescent signal detected to the corresponding scan number. The software then deter- 
mines the sample lane positions based on the signal intensities. After the lanes are tracked, the cross-section of data 
for each lane are extracted and processed by baseline subtraction, mobility calculation, spectral deconvolution and 
time con-ection. After processing, the sequence data files are transferred to a SPARCstation 2 using NFS Share ' 
[0330] Protocol: prepare 8 M urea. 4.75% polyacrylamide gels, as described above, using a 36-well comb Prior to 
toading. dean the outer surface of the gel plates. Assemble the gel plates into an ABI 373A DNA Sequencer (Foster 
City. CA) so that the lower scan (usually the blue) line corresponds to an intensity value of 800-1000 as displayed on 
the computer data collection window. If the baseline of four-color scan lines is not flat, redean the glass plates Affix the 
aluminurn heatdistribution plate. Pre-electrophorese the gel for 10-30 minutes. Prepare the samples for loading Add 3 
vA of FE to the bottom of each tube, vortex, heat at 90°C for 3 minutes, and centrifuge to reclaim condensation Rush 
the sample wells with electrophoresis buffer using a syringe. Using flat-tipped gel loading pipette tips, load eadi odd- 
numbered sample. Pre-electrophorese the gel for at least 5 minutes, flush the wells again, and then load each even- 
numbered sample. Begin the electrophoresis (30 W for 10 hours). After data collection, the ABI software wiU automati- 
cally open the data analysis software, which will create the imaged gel file, extract the data for each sample lane and 
process the data. 

F. Dpuble-stranded sequencinq of cDNA dones cnntainirw, lono poivf te H s using anrhnred ooivfri-n p rimo^ 

[0331] Double-stranded templates of cDNAs containing tong poly(A) tracts are difficult to sequence with vector 
primers which anneal downstream of the poly(A) tail. Sequendng wrth these primers results in a long pdyfO ladder fol- 
'"^^ ^L^. sequence which may be difficult to read. To drcumvent this problem, three primers whidi contain (dT)., and 
either (dA) or (dC) or (dQ) at the 3' end were designed to -anchor' the primers and alfow sequendng of the region i^mme- 
diately upstream of the poly(A) region. Using this protocol, over 300 bp of readable sequence could be obtained The 
sequence of the opposite strand of these cDNAs was determined using insert-specific primers upstream of the poly{A) 
region. The ability to directly obtain sequence immediately upstream from the poly(A) tail of cDNAs should be of partic- 
ular importance to large scale efforts to generate sequence-tagged sites (STSs) from cDNAs 

[0332] The protocol is as follows. Synthesize anchored poly (dT)i7 with anchors of (dA) or (dC) or (dG) at the 3' end 
on a DNA synthesizer and use after purification on Oligonudeotide Purification Cartridges (Amicon Beverly MA) For 
sequencing with anchored primers, denature 5-^g of plasmid DNA in a total volume of SO pi containing 0 2 M sodium 
hydroxide and 0.16 mM EDTA by incubation at 65°C for 10 minutes. Add the three poly(dT) anchored primers (2 pmol 
of each) and immediately place the mixture on ice. Neutralize the solution by adding 5 ml of 5 M ammonium acetate 
pH 7.0- 

[0333] Precipitate the DNA by adding 1 50 ^1 of cold 95% ethanol and wash the pellet twice with cold 70% ethanol 
Dry the pellet for 5 minutes and then resuspend in MOPS buffer. Anneal the primere by heating the solution for 2 min- 
"*!!»*L^^^^^^.'!.°*'^ ^ cooling to room temperature for 15-30 minutes. Perform sequencing reacttons. using 
modified T7 DNA polymerase and a-pPjdATP (> 1000 Ci/mmole) using the protocol described above 
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G. cDNA seauencing based on PGR and random shotgun cloning 

[0334] The following is a method for sequencing cloned cDNAs based on PGR amplification, random shotgun clon- 
ing, and automated fluorescent sequencing. This PGR-based approach uses a primer pair between the usual "univer- 
5 sar forward and reverse priming sites and the multiple cloning sites of the Stratagene Bluescript vector. These two PGR 
primers, with the sequence 5'-TGGAGiGTGGAGGGTATGG-3' (Seq. ID No. 15) for the fonward or -16bs primer and 5'- 
GGCGCTGTAGAACTAG TG-3* (Seq. ID No. 16) tor the reverse or +19bs primer, may be used to amplify sufficient quan- 
tities of cDNA inserts in the 1.2 to 3,4 kb size range so that the random shotgun sequencing approach desaibed below 
could be implemented. 

10 [0335] The following is the protocol. Incubate four 100 ^1 PGR reactions, each containing approximately 100 ng of 
plasmid DNA, 100 pmoles of each primer, 50 mM KCI. 10 mM Tris-HGI pH 8.5. 1.5 mM MgGl2. 0.2 mM of each dNTP. 
and 5 units of PE-Cetus Amplitaq in 0.5 ml snap cap tubes for 25 cycles of 95°G for 1 mirurte, 55**G for 1 minute and 
72*'G for 2 minutes in a PE-Getus 48 tube DNA Thermal Gycler. After pooling the tour reactions, the aqueous solution 
containir^ the PGR product is placed in an net)ulizer, brought to 2.0 ml by adding approximately 0.5 to 1.0 ml of glyc- 

13 erol, and equilibrated at -20*G by pladng it in either an isopropyl alcohol/dry ice or saturated aqueous NaCIAiry ice bath 
for 10 minutes. The sample is nebulized at -2a*G by applying 25 - 30 psi nitrogen pressure for 2.5 min. Following ethanol 
precipitation to concentrate the sheared PGR product, the fragments were blunt ended and phosphorylated by incuba- 
tion with the Klenow fragment of E. coti DNA polymerase and T4 polynucleotide kinase as described previously. Frag- 
ments in the 0.4 to 0.7 Kb range were obtained by elution from a low melting agarose gel. 

20 [0336] From the foregoing, it will be appreciated that, although specific embodiments of the invention have been 
described herein for purposes of illustration, various modifications may be made without deviating from the spirit and 
scope of the invention. Accordingly, the invention is not limited except as by the appended claims. 
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SEQUENCE U^TThsr, 

(1) GENERAL INFORMATION: 

(1) APPLICANTS; Van Ness, Jeffrey 

Tabone. John C. . 
Howbert. J. Jeffry 
Mulligan. John T. . 

TITLE OF INVENTION: METHODS AND COMPOSITIONS FOR DETERMINING 
THE SEQUENCE OF NUCLEIC ACID MOLECULES 

(ill) NUMBER OF SEQUENCES: 16 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SEED and BERRY 

(B) STREET: 6300 Columbia Center. 701 Fifth Avenue 

(C) CITY: Seattle 

. (D) STATE: Washington 

(E) COUNTRY: USA 

(F) ZIP: 98104-7092 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC Compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) ' SOFTWARE: Patentin Release #1.0. Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 22-JAN-1997 

(C) CLASSIFICATION: 

(vlii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: McMasters. David D. 

(B) REGISTRATION NUMBER: 33.963 

(C) REFERENCE/DOCKET NUMBER: 240052.416 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (206) 622-4900 

(B) TELEFAX: (206) 682-6031 



(2) INFORMATION FOR SEQ ID N0;1: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY; Hnear 



(XI) SEQUENCE DESCRIPTION: SEO ID NO 1 : 
TGTAAAACGA CGGCCAGT 
(2) INFORMATION FOR SEQ 10 NO. 2; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY, linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2; 

TGTAAAACGA CG6CCAGTA 

(2) INFORMATIW FOR SEQ ID N0:3: 

{^) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(8) TYPE; nucleic acid 
(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
TGTAAAACGA CGGCCAGTAT 
(2) INFORMATION FOR SEO ID N0:4: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 
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(x1) SEQUENCE DESCRIPTION: SEO ID NO: 4; 
TGTAAAACGA CGGCCAGTAT G 
(2) INFORMATION FOR SEO ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEO ID N0;5: 
TGTAAAACGA CGGCCAGTAT GC 
(2) INFORMATION FOR SEQ ID N0:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



SEQUENCE DESCRIPTION: SEQ ID N0:6: 
TGTAAAACGA CGGCCAGTAT 6CA 
(2) INFORMATION FOR SEO ID N0:7: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TGTAAAACGA CGGCCAGTAT 6CAT 

(2) INFORMATION FOR SEO ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
TGTAAAACGA CGGCCAGTAT GCATG 
(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9: 
TGTAAAACGA CQGCCACG 
(2) INFORMATION FOR SEQ ID NO: 10: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: S£Q ID MO: 10: 
TGTAAAACGA CGGCCASCG 
(2) INFORMATION FOR SEO ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEONESS: Single 

(D) TOPOLOGY: linear 



(x1) SEQUENCE DESCRIPTION: SEQ ID NO; 11; 
TGTAAAACGA CGGCCAGC6T 
(2) INFORMATIOM FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 12; 
TGTAAAACGA CGGCCAGCGT A 
(2) INFORMATION FOR SEQ 10 NO: 13: 

(1> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
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TGTAAAACGA CGGCCAGCGT AC 

(2)' INFORMATION FOR SEQ ID NO: 14; 

(i) SEOUE^CE CHARACTERISTICS; 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:14: 
TGTAAAACGA CGGCCAGCGT ACC 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
CO STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi^ SEQUENCE DESCRIPTION; SEQ ID NO: 15: 
TCGAGGTCGA CGGTATCG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 16: 
GCCGaCTAG AACTAGTG 
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Claims 

1. A method comprising: 

(a) providing DNA fragments, each fragment having cleavaWy attached thereto a mass tag- 

(b) separating the tagged fragments on the basis of fragment charge, sequential length size or shaoe 

(c) placing a charge on the tag and cleaving the tag from the fragment. vS,ere the c^rge Ts pSc^l^ the tag 
prior to. dunng. or after cleaving the tag from the fragment: and ^ 

(d) deterirrinrng each tag by mass spectrometry. 

2. A method for determining the sequence of a nucleic acid molecule, conprising: 

in.r"^If ""^ tagged nucleic acid fragments which are complementary to a selected target nucleic acid md- 

ShS^^T"! *^99«*fr«9'T>ents on the basis of fragment chaise, sequential length, size or shape and 
(c) detecting the tags by potentiometry. and therefrom determining the sequence of L nuclei^ acS'SjI^le. 

3. A method for determining the sequence of a nucleic acid molecule, conprising: 

(a) generating tagged nucleic acid fragments which are complementary to a selected target nucleic acid mol- 

r:vT^x*rer;"'^"""^^^^ 

(b) separating the tagged fragments on the basis of fragment chaige. size or shaoe- 

(c) cleaving the tags from the tagged fragments: and " 

Sl''^SSe*'^*^''^*^'*^'"'^°''^^'^^^" 

4. A compound of the formula T^'-L-X wherein. 

"1°'?!"'*' 9™"P^etectable by mass spectrometry, comprising carbon, at least one of hydrogen and flu- 
oride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine- 

wherem the "r'-conteining moiety comprises a functional group which supports a single ionized cSroeSate 
«^n^e compound « subjected to mass spectrometry and is tertiary ^e. quaternary ar^ne^^^^^^^^ 

X is a functional group selected from hydroxyl, amino, thiol, carboxylic add haloalM and derivatives fh«ra«i 
wh«h either activate or inhfcit the activity of the group toward coupTng with oSS nSl^fl^. 

5. A compound of the formula T™-L-X wherein. 

T^^ is an organic group detectable by mass spectrometry, with the proviso that T"- does not comorise a 
sequence of reporter groups: carrpnse a 

us an organic group which aUows a T-^-containing moiety to be cleaved from the remainder of the comnound 
wherein the T™«.corrta.n.ng moiety comprises a functional group which supports a single iii^^ cSrTS^ 
when *^e compound is sub,ected to mass spectrometry and is tertiary amine, quaternary amine^ or^^S 

X is a nucleic add; 

where the compound is not bound to a solid support 

6. A compound according to any one of daims 4 or 5 wherein L is selected from L~" L*"" iPi 0^ i etc 
L and Lss. where actinic radiation, acid. base, oxidation, reduction, enzyme, electrochemical thermal and'ttiiol 
exchange, respectn/ely. cause the T-'-containing moiety to be cleaved from the remainder Jlhe n^lSule 

7. A composition comprising first and second compounds of the formula t™*-l''^x wherein. 

L||^ is a diemical group whidi. upon exposure to light of selected wavelength, allows deavage of from X- 
T- « an organs group whidi is detectable by mass specfrometry and comprises a variable mal «„Tp^nei'; 
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and 

X is nucleic acid; 

with the proviso that the variable mass component in the first compound has a mass that is not identical to the 
mass of the variable mass component of the second compound, but the groups in the first and second 
compounds are othenwise identical. 

8. A compound according to any one of claims 4 or 5 wherein L is L^^, or a composition comprising compounds 
according to claim 7. wherein L^"^ has the formula L^-L^-L^ and -L^-L^ has the formula: 




with one cartoon atom at positions a, b. c, d or e being substituted witii -L^-X and optionally one or more of 
positions b, c, d or e being substituted with alkyi, alkoxy, fluoride, chloride, hydroxyl, cartDoxylate or amide; and 
is hydrogen or hydrocart)yl. 

9. A compound or composition according to claim 8 wherein -L^-X is located at position a. 

10. A compound, or composition conprising compounds, according to any one of claims 4-9 where the compound 
comprises the formula: 

f 
I 

Amide 
I 

n (C"2)c 



wherein 

G is (CH2)i.6 wherein a hydrogen on one and only one of the CHg groups of each G is replaced with-(CH2)c- 
Amide-T*; 

t2 and T* are organic moieties of the formula CvasNo-gOo-gSo-sPo-aHaFpIs wherein the sum of a. p and S is 
sufficient to satisfy the otherwise unsatisfied valencies of the C. N, O, S and P atoms; 
Amide is 



O 
11 

-N-C- 



or 



O 
II 

— c-N— ; 



R 



1 



I 



1 
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is hydrogen or C^.^o alkyi; 
c is an integer ranging from 0 to 4; and 

n is an integer ranging from 1 to 50 such that when n is greater than 1. G. c. Amide, and r* are indeoend- 
ently selected. 

11. A compound, or composition comprising compounds, according to any one of claims 4-9 where the comoound 
comprises the formula: « 

I 

Aniide 

I 




L' 



r' O (CH2)c 
Amide 

wherein 7^ is an organic moiety of the formula Ci.25No.90o.9So.3Po.3HaFpU wherein the sum of a. p and 5 
IS suffiaent to satisfy the othenivise unsatisfied valencies of the C. N. O. S and P atoms: and includes a tertiary 
or quaternary amine or an organic acid: and m is an integer ranging from 0-49. 

12. A compound, or composition comprising compounds, according to any one of claims 4-9 where the comoound 
conrprises the formula: iHwuiiu 




wherein T« is an organic moiety of the formula Ci.25No.90o.9So.3P(>.3HaFpl6 wherein the sum of a. p. and 6 
IS sufficient to satisfy the otherwise unsatisfied valencies of the C. N. O. S and P atoms; and includes a tertiary 
or quaternary amine or an organic acid; and m is an integer ranging from 0-49. 

13. A composition comprising a plurality of compounds according to any one of claims 4 or 5. or a composition of daim 
7. wherein no two compounds in tiie composition have either the same T"^ or the same X. 

14. A compound according to claim 5. or a composition according to claim 7. wherein X is a DNA sequencing primer, 

1 5. A compound of tine formula ("r"^)n-L-X wherein. 

T^^ is an organic group detectable by mass spectrometry, comprising carbon, at least one of hydrogen and flu- 
oride, and optional atoms selected from oxygen, nitrogen, sulfur, phosphorus and iodine- 
L is an organic group which allows a r"^-containing moiety to be deaved from the remainder of ttie compound 
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wherein the T^^-containing moiety comprises a functional group which supports a single ionized charge state 
when the compound is subjected to mass spectrometry and is tertiary amine, quaternary amine or organic 
acid; 

X is a functional group selected from hydroxyl, amino, thiol, cartjoxylic acid, haloalkyi, and derivatives thereof 
which either activate or inhibit the activity of the group toward coupling with other moieties; and 
n is a number of T^^ groups, where n is an integer greater than one. 

16. A compound of the formula (T™)n-L-X wherein. 

T"® is an organic group detectable by mass spectrometry, with the proviso that P"® does not comprise reporter 
groups; 

L is an organic group which allows a T"^®-containing moiety to be cleaved from the remainder of the compound, 
wherein the T^^-containing moiety comprises a functional group which supports a single ionized charge state 
when the compound is subjected to mass spectrometry art6 Is tertiary amine, quaternary amine or organic 
acid; 

X is a nucleic acid; and 

n is a number of T^^ groups, where n is an integer greater than one. 

17. A compound of the formula T"®-L-X wherein, 

r^® is an organic group detectable by mass spectrometry, with the proviso that T"* does not comprise reporter 
groups; 

L is an organic group which allows a 1™-containing moiety to be cleaved from the remainder of the compound, 
wherein the T^^-corrtaining moiety comprises a functional group which supports a single ionized charge state 
when the compound is subjected to mass spectrometry and is tertiary amine, quaternary amine or organic 
acid: and 

X is a molecule of interest (MOI) selected from proteins, peptides, antibodies or antibody fragments, receptors, 
receptor ligands. members of a ligand pair, cytokines. hornx>nes, oligosaccharides, synthetic organic mole- 
cules, and drugs. 

18. A compomd according to claims 4. 5, 15, 16 or1 7 wherein the tag is non-volatile and thermally labile. 
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FIGURE 3 
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FIGURE 4 
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FIGURE 7 



84 



EP 0 992 511 A1 




FIGURE 8 
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